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Abstract 

The suboptimal performance of wavelets with regard to the approximation of multivariate data 
gave rise to new representation systems, specihcally designed for data with anisotropic features. Some 
prominent examples of these are given by ridgelets, curvelets, and shearlets, to name a few. 

The great variety of such so-called directional systems motivated the search for a common frame¬ 
work, which unites many under one roof and enables a simultaneous analysis, for example with respect 
to approximation properties. Building on the concept of parabolic molecules, the recently introduced 
framework of a-molecules does in fact include the previous mentioned systems. Until now however it 
is conhned to the bivariate setting, whereas nowadays one often deals with higher dimensional data. 

This motivates the extension of this unifying theory to dimensions larger than 2, put forward in this 
work. In particular, we generalize the central result that the cross-Gramian of any two systems of 
Q-molecules will to some extent be localized. 

As an exemplary application, we investigate the sparse approximation of video signals, which are 
instances of 3D data. The multivariate theory allows us to derive almost optimal approximation rates 
for a large class of representation systems. 

Keywords: Wavelets, Shearlets, Anisotropic Scaling, a-Molecules, Multiscale Analysis, Nonlinear 
Approximation. 

2010 MSC: 41A30, 41A63, 42C40 

1 Introduction 

[ 30 ne of the most influential modern developments in applied harmonic analysis has undeniably been the 
introduction of wavelets [T31. Their construction is based on dilations and translations of a (finite) set 
of generating functions {gx)x C By carefully choosing the generators, the resulting systems can 

become frames or even orthonormal bases of the space Furthermore, additional properties can 

be obtained, such as e.g. smoothness or compact support. Some real-world applications of wavelets today 
are e.g. data compression (e.g. JPEG2000) or restoration tasks in imaging sciences [T]. In the field of 
PDF’s wavelets nowadays play a central role in solving elliptic equations [7]. 

The great success of wavelet systems - besides their elegant construction principle and available fast 
numerical implementations - rests upon the fact that they provide efficient multiscale representations for 
various types of data. In particular, they optimally sparsely approximate functions / : —>■ C, which 

are smooth apart from (a finite number of) point singularities, in the sense of fast decay of the V-term 
approximation error. Since such singularities are the only ones that occur in ‘reasonable’ ID data, we 
can safely say that wavelets are optimal for approximating one-dimensional functions. 

Moving up a dimension however, the situation changes completely. Two-dimensional data may well 
have singularities along curves and is often governed by such anisotropic features - think of edges in 
images for instance. A widely used model for such data is the class of cartoon-like functions [H], i.e. 
functions which are smooth except for a curve-like singularity (see Section [4] for details). For this class, 
wavelets do not perform optimally any more m, and hence other approaches have to be considered. 
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1.1 Directional Representation Systems 

The reason for the non-optimal performance of wavelets in a multivariate setting is due to their isotropic 
scaling law, which is not optimally suited for resolving anisotropic, i.e. directional, features. Therefore, 
many systems employing some form of anisotropic scaling and thus better suited for this task have been 
considered in recent years. Subsequently, we briefly recall a few of these constructions for motivation 
purposes, but by no means this shall be a complete overview. 


1.1.1 Ridgelets 

Aiming to approximate functions with line singularities, Candes defined so-called ridgelets [2] as trans¬ 
lated, rotated and dilated versions of a ridge function, which is constant orthogonal to some specified 
direction 77 € . Since such functions are not square-integrable, the concept was adjusted by Donoho [3] 

allowing ridgelets a slow decay orthogonal to the ? 7 -direction, leading to a modified notion adopted for 
instance in [231 [Ml [25]. In the new sense, a system of ridgelets is constructed performing rotations, 
translations and directional scaling on a generator g G L^(R‘^) with corresponding scaling matrix 

5) , s > 0. 

Tight ridgelet frames of this type were constructed e.g. in [MlIMj . 


1.1.2 Curvelets and Shearlets 

A true breakthrough was achieved by Candes and Donoho in 2002 with the introduction of curvelets [5] , 
the first system to provide a provably (almost) optimal approximation rate for a certain class of cartoon¬ 
like functions. Again, the idea is to apply certain rotation, translation and scaling operations to a 
generating function. The major novelty was the use of parabolic scaling, a compromise between directional 
scaling used for ridgelets and isotropic scaling used for wavelets, described by a matrix of the form 


- (0 si/ 2 ) ’ 


s > 0. 


( 1 ) 


This type of scaling is specifically adapted to data with C^-discontinuity curves, since it leaves the 
parabola invariant and produces functions with essential support in a rectangle of size ‘width Ri length^ 
We mention that in the actual construction of the classical tight frame of curvelets [5], the translations 
and rotations are applied to a set of generators, related to each other by a parabolic scaling law realised 
not by m but by dilations with respect to polar coordinates. 

A few years after curvelets in 2005, shearlets were developed mainly by Kutyniok, Labate, Lim, and 
Weiss gsj. They also scale parabolically and feature the same celebrated approximation properties as 
curvelets for cartoon-like functions [sniisHj. The main difference is that shearings and not rotations are 
used for the change of direction. The choice of shears makes shearlets more adapted to a digital grid, 
since shearings given by the matrices 

Sh = 5 ) and , hGR, ( 2 ) 

leave the digital grid invariant. This is favorable in a discrete setting and bears the advantage of a 
unified treatment of the continuum and digital realm. It should be noted that some actual constructions 
of shear let systems are not entirely faithful to the original idea of applying shears, translations, and 
parabolic scalings to a single generator: The probably most notable and widely used adjustment is the 
idea of cone-adaption, where several generators with different orientations are used in order to avoid large 
shear parameters. We will discuss this strategy in greater detail later in the article. 

Nowadays, shearlets are a widely used directional representation system with applications ranging 
from imaging science [16] , simulations of inverse scattering problems [43] to solvers for transport equations 
|12) . For more information we refer to the book |39] . 
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1.1.3 a-Curvelets (and -Shearlets) 

The systems we have presented so far all utilize different versions of the scaling matrix 

= (q , S > 0, (3) 

where the parameter a G [0,1] specifies the degree of anisotropy in the scaling: a = 1 corresponds to 
wavelets, a = ^ to curvelets and shearlets, and a = 0 to ridgelets. This observation was used in [25] to 
define a-curvelets, and associated bandlimited tight frames were constructed for every a G [0,1]. Similar 
to a-curvelets, the notion of a shearlet can be generalized to comprise a-scaling. The resulting a-shearlets 
have been defined and examined in [33 HT] (for the range a G [^, !))• 

1.2 A common Framework 

The directional systems described above are all constructed using the same idea: start with a set of 
generators, and then perform scalings (with some degree of anisotropicity), changes of direction (e.g. 
rotations or shears) and translations. Further, in order to obtain systems with desirable properties, some 
regularity condition on the generators has to be posed. Having this in mind, it seems possible to regard 
all such systems as certain instances of a more general concept. 

1.2.1 Parabolic Molecules 

In 2011, Grohs and Kutyniok introduced the concept of parabolic molecules m, which allows to derive 
classical curvelets and shearlets as special instances of the same general construction process. Starting 
from a set of generators, a system of parabolic molecules is obtained via parabolic dilations, rotations, 
and translations. The essential novelty is that the generators can, apart from a certain time-frequency 
localization, be chosen freely and each function may have its own generator. Together with the utiliza¬ 
tion of so-called parametrizations to allow generic indexing, the ‘variability’ of the generators provides 
the flexibility to cast rotation- and shear-based systems as instances of one unifying construction prin¬ 
ciple. Moreover, it becomes possible to relax the vanishing moment conditions — important for high 
approximation rates - imposed on the generators. Rather to demand rigid conditions as in most classical 
constructions, it suffices to require the moments of the variable generators to vanish asymptotically at 
high scales, without changing the asymptotic approximation behavior. 

1.2.2 a-Molecules 

The scope of parabolic molecules is limited to parabolically scaled systems, wherefore a major gen¬ 
eralization was pursued in [28], namely the extension to a-molecules. These incorporate more general 
a-scaling m and can thus bridge the gap between wavelets and ridgelets, as well as curvelets and shearlets 
in between. 

However, like the framework of parabolic molecules, they are confined to a 2-dimensional setting. 
Since nowadays higher dimensional data plays an ever increasing role, an extension of the theory to 
higher dimensions is appreciable. A first step in this direction was taken by one of the authors |20j 
with an extension of the parabolic molecules framework to 3D. In this paper, we aim to generalize the 
framework to arbitrary dimensions d G N, d > 2 and general scaling parameters a G [0,1]. 

1.2.3 Why a-Molecules? 

The concept of (multivariate) a-molecules covers a great variety of directional multiscale systems and 
unifies their treatment and analysis, e.g. with respect to approximation properties. The foundational 
result behind this is Theorem [331 i.e. the fact that the localization of the cross-Gramian of two sys¬ 
tems of a-molecules - in the sense of a strong off-diagonal decay - merely depends on their respective 
parametrizations and orders. Hence, the parametrization and the order of a system of a-molecules alone 
is sufficient information to determine the corresponding approximation behavior. This is illustrated by 
Theorem oi where a large class of directional representation systems is specified with almost optimal 
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approximation performance with respect to cartoon video data. Finally, we remark that apart from the 
analysis aspects the framework also promises new design approaches for novel constructions. 

1.3 Our Contributions 

As mentioned before, the goal of this paper is to generalize the concept of a-molecules to arbitrary 
dimensions. The multivariate formulation extends the earlier results from [23 [2H1 ED] and gives valuable 
insights, e.g. on how they scale with the dimension. One should emphasize that the extension beyond 
dimension 2 comes with several delicacies such as to determine a suitable definition of the so-called a- 
scale index distance. The technical effort to prove the mentioned results is considerably higher when 
dealing with more than 2 dimensions, mainly because many arguments take place on the unit sphere 
instead of the unit circle. It also gets significantly harder to prove that shearlets, i.e. the main example 
of multidimensional directional systems in dimensions higher than 2, can be included in the framework. 
Hence already the core results of the theoretical framework themselves do not generalize straightforwardly. 

1.4 Outline 

The paper is organized as follows. The core part of the theory, in particular Theorem 12.51 is presented 
in Section [21 Since the corresponding proof is quite involved it is outsourced to Section [51 The abstract 
theory is further developed in Section [3] with a focus on approximation theory. Here Theorem 13.71 
derives sufficient conditions for two systems of a-molecules to be sparsity equivalent. In Section [4] we 
then exemplarily apply the theory to video data and identify a large class of representation systems 
providing almost optimal sparse approximation in Theorem 14.41 Finally, Section [5] is devoted to a large 
class of concrete systems of a-molecules, namely multivariate a-shearlet molecules. As specific examples 
we present pyramid-adapted shearlet systems, in particular those generated by compactly supported 
functions and the smooth Parseval frame of band-limited shearlets by Guo and Labate. This frame is 
presented in greater detail in Subsection 15.3.11 thereby fixing some inaccuracies of the original definition. 

1.5 Notation 

The (strictly) positive real numbers are denoted by R+. The vector space where d G N, is equipped 
with the usual Euclidean scalar product denoted by (•, •). N hereby denotes the set of positive integers, 
while No = N U {0}. For the unit sphere in the symbol is used. The standard unit vectors are 
given by ei,..., and for a vector a: G we use the notation [x\i := {x, Ci), * G {1,..., d}, for the *:th 
component. Its p-(quasi-)norm in the range 0 < p < oo is denoted by |a::|j,. In case of the Euclidean norm 
|a ;|2 = \/ {x, x), we will usually omit the subindex. We further define := |([a;]i,..., [a;]d_i, 0)^|2. 

For a matrix M G M"*’”, we denote its operator norm as a mapping from the Euclidean R" to the 
Euclidean R'" by ||M|| 2 ^ 2 ’ entries by [M]^, i = 1,... ,m, j = 1,.. .n. 

The usual Lebesgue spaces on R'^ are denoted by Lp(R'^), where 0 < p < oo. The corresponding 
sequence spaces are given by I'^(A), where A is a countable index set. In both cases we use the symbol 
II • lip for the associated (quasi-)norms. Eurther, the symbol (•, •) will also be used for the inner products 
on the Hilbert spaces L^(R'^) and £'^{A). Eor the weak versions of the sequence spaces we use the notation 
oj£P{A) with associated (quasi-)norms || • \\i^tP. For their dehnition we refer to Subsection 13.11 

In addition, we need the following function spaces on R'^: the space of continuous functions C'(R‘^), the 
space of n-times continuously differentiable functions C'"(R‘^) for n G NU{oo}, as well as their respective 
restrictions Cc(R'^) and C'”(R‘^) to functions with compact support. 

The Fourier transform / of a function / in the space of Schwarz functions is given by 

/(O = / f{x)exp{-2TTi(^,x))dx. 

As usual, it extends to the space of tempered distributions 

For two entities x, p, usually dependent on a certain set of parameters, the notation ‘x < y' shall 
mean that x < Cy for some hxed constant C > 0, which is independent of the involved parameters. 
If both X p and p ^ x we write ‘x x p’. We further need the ceiling function on R given by 
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[x] := min{£ & I, : I > a;}. A useful abbreviation is also the ubiquitous ‘analyst’s bracket’ defined by 
{x) := vT+l? for a; £ M. 


2 ct-Molecules in d Dimensions 


Recalling the definition in [^, a system of bivariate a-molecules consists of functions in obtained 

by applying a-scaling, rotations, and translations to a set of generating functions, which need to be 
sufficiently localized in time and frequency. Due to this construction, every a-molecule is naturally 
associated with a certain scale, orientation and spatial position, which - in the 2 -dimensional case - is 
conveniently represented by a point in the corresponding parameter space P 2 = R+ x x 

Aiming for a multivariate generalization, we thus first need a d-dimensional version of this parameter 
space. We let 8 “*“^ denote the unit sphere in and put 

Pd = R+ X X 


Each function m\ £ L^(R‘^) of a system of d-dimensional a-molecules shall by definition then 

be associated with a unique point (s\, ex, Xx) £ Pd, where the variable sx £ R+ shall represent its scale, 
the vector ex £ its orientation in R'^, and xx £ R^* the spatial location. The relation between the 
index A of a molecule mx and its position (sx, ex, xx) in Pd is described by a so-called parametrization, 
analogue to [55]. 


Definition 2.1. A parametrization consists of a pair (A, $a), where A is a discrete index set and $a is 
a mapping 


$A : 


A ^ 
A £ A I—)■ 


(sA,eA,XA) . 

which associates with each A £ A a scale sx £ R+, a direction ex £ and a location xx £ R'’*. 

For practical purposes it is more convenient to represent an orientation 77 £ by a set of angles. 
Therefore we define the rotation matrix Rg tor 9 = {9i, , dd-2) £ R'^“^ by 

/cos(dd-2) - sin(dd-2) 


Re = 



Id-2 


— sin(di 
COs(di 


sin(dd-2) 


cos(dd-2) 


V 


\ 


Id-?,) 


where Ig for d £ N denotes the d-dimensional identity matrix. Furthermore, we introduce for (p £ R the 
matrix 


cos((/?) sin(i 7 ?) 
R,p = \ - sin( 79 ) cos((/?) 


Id-2 


Note that these definitions pose an inconsistency in the notation, since they depend on the particular 
naming of the index. However, since we always use these particular indices, this will not lead to any 
problems while improving the readability significantly. 

Each orientation rj £ can now be uniquely represented by a set of angles (di,... ,9d-2,‘p) £ 

[0,7r] X [—§, X [0, 27r] via the relation 

V = ldpIIe(^d, (4) 

where eg is the dth unit vector of R*^. Explicitly, it is given by 

/ cos(vj) cos(dd-2).cos(d2) sin(di)\ 

sin((p) cos(dd-2).cos(d2) sin(di) 

- sin(dd- 2 ) cos(dd- 3 ) • • • cos(d 2 ) sin(di) 

— sin(d 3 ) cos(d 2 ) sin(di) 

— sin(d 2 ) sin(di) 

\ cos(di) / 


r]{9,(p) = 




\Vd{0,(p)^ 
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Next, we adapt the a-scaling matrix to the multivariate setting. For a G [0,1], we set 


A 


a,s 




( 5 ) 


In case a = 1 this matrix scales isotropically, in the range a G [0,1) it scales uniformly in all directions 
except for the e^i-direction. Note that here we choose as the distinguished direction in which the scaling 
is stronger - in contrast to the 2-dimensional case [28] , where ei was chosen. 

After this preparation we are ready to give the definition of d-variate a-molecules, d G N, d > 2, which 
essentially reduces to the original definition from for d = 2, except for the interchanged roles of the 
directions ei and e^. 

Definition 2.2. Let a G [0,1], d G N, d > 2 and L, M, Ni, N2 G No U { 00 }. Further let (A, $a) be a 
parametrization with <I>a(A) = {sx,e\,x\) G Pd for A G A. The corresponding angles (|4]) for e\ shall 
be denoted by A family of functions is called a system of d-dimensional 

a-molecules of order {L, M, Ni, N 2 ) with respect to the parametrization (A, $a), if each mx is of the 
form 


l + c(d-l) 

mx = s^ gx{Aa,s>,R9^R,^,^ix - xx)) (6) 

with generators gx G L^(R‘^) satisfying for every multi-index p G Nq with |p|i < L the condition 

|5^5a(C)I < niin (1, + |K]d| + |C|[d-i]) (7) 

The implicit constant in © is required to be uniform in A. In case that a control parameter takes the 
value 00 , this shall mean that the condition © is fulfilled with the respective quantity arbitrarily large. 

A system of a-molecules is thus obtained by applying rotations, translations, and a-scaling to a set 
of generating functions {gx)x, which are required to obey a prescribed time-frequency localization. Every 
molecule mx is thereby allowed to have its own individual generator gx- 

The definition only poses conditions on the Fourier transform of the generators gx- The number L 
describes the spatial localization, M the number of directional (almost) vanishing moments, and Ni,N 2 
the smoothness of an element mx- Also note that the weighting function on the right hand side of © is 
symmetric with respect to rotations around the e^-axis, as well as reflections along this axis. 

Applying Aq,_s with a < 1 and s > 1 to the unit ball i? = {x G : |a:| < 1} stretches B in the 
Cd-direction. This results in plate-like support of the characteristic function xs(^a,s‘) for large s G M+, 
with the ‘plate’ lying in the plane spanned by the vectors {ei,..., Cd-i}. Thus, at high scales a-molecules 
can be thought of as plate-like objects in the spatial domain. The approximate frequency support on the 
other hand is concentrated in a pair of opposite cones in the direction of the respective orientation. 

Remark 2.3. It may seem more natural to choose a rotation Rrj from Cd to g G in the {ed,g)-plane 
to adjust the orientation in ® . Due to the symmetries of the weighting function of the generators, this 
choice is however not necessary. Since it is easier to use fixed rotation planes, we stick to this more 
pragmatic choice of rotation parameters. 

Let us conclude this paragraph with some comments on the use of the term ‘molecule’. In the theory 
of function spaces, ‘atoms’ originally refer to the basic building blocks of a function space. In the widest 
sense, an atomic decomposition of a function space is a countable subset containing functions called 
‘atoms’, which allow to represent every function of the space as a countable linear combination. 

In the theory of atomic decomposition of so called Hardy spaces , an atom is defined as a func¬ 
tion a supported in some cube Q possessing vanishing moments and satisfying some norm bound, e.g. 
||a|l 2 ^ \QV ^[H]- This definition has also been adapted in a slightly less rigid form in other parts 
of mathematics. Here, the term "atom” often simply refers to a function having compact support and 
possessing many vanishing moments. Typically, these atoms also possess additional properties, e.g. they 
may be bounded or fulfill smoothness conditions. 
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Furthermore, in many instances, the atoms are also related to each other. Classical coorbit spaces e.g. 
possess atomic decompositions with atoms obtained from the action of a group on a single generator [18] . 
A primary example are the (homogeneous) Besov-Triebel-Lizorkin spaces with (compactly supported) 
wavelets as atoms. 

A system of a-molecules somewhat resembles this structure, whereas in a relaxed form. The compact 
support condition is replaced with a less restrictive decay condition, and the moments are only asymptot¬ 
ically vanishing. In addition, the molecules are related to each other via certain group transformations, 
namely translations, rotations, and dilations. However, this relation shall not be understood in a strict 
sense, since the generators are allowed to vary to some extent. They just need to fulfill a uniform localiza¬ 
tion condition. For the functions of a system featuring these kind of soft conditions the term ‘molecules’ 
was coined and used e.g. in n 131 mi 111]- 

2.1 Index Distance in d Dimensions 

A central ingredient of the theory of bivariate a-molecules [33] is the fact that the parameter space P 2 can 
be equipped with a natural (pseudo-)metric with the property, that the distance between two points in 
P 2 ‘anti-correlates’ with the size of the scalar products of a-molecules associated with those points: The 
greater the distance between two indicesA, A' G P 2 , the smaller the scalar product of the corresponding 
a-molecules G L^(K‘^). 

Our next aim is to find a suitable analogon of this (pseudo-)metric for the parameter space Pd. As 
for P 2 , the distance between two points {s\,e\,xx), G Pd must certainly take into account 

their spatial, scale, and orientational relation. The spatial distance is measured by a rightly balanced 
combination of an isotropic term \x\ — Xfj^\‘^ and a non-isotropic component \{ex^x\ — x^)\, which depends 
on the orientation of the molecules. For the distance between the orientations ex and e^, it seems natural 
to consider the angle d%[ex,e^) = arccos((e;^, e^)) with d%[ex,e^) G [0,7r]. Due to the symmetries of the 
weighting function in ([7|) however, the angle ds{ex, e^) is projected onto the interval [—7r/2,7r/2), with the 
projected angle {(is(eA, e^)} being the unique element of the set {ds{ex, e^) + mr | n G Z} in the interval 
[—7r/2,7r/2). A suitable measure for the orientational distance is then |{(is(eA, e^)}p. This definition is 
in fact consistent with the one in [28], since in two dimensions we have |{(is(eA, e;_j)}| = |{(^a — 
Finally, the distance between different scales sx, > 0 is measured by the ratio max {sa/s^, s^/sa}- 
Altogether, this leads to the following definition. It directly generalizes the metric introduced in [29] . 
which is a simplified version of the original metric from [28] . 

Definition 2.4. Let a G [0,1]. For given parametrizations (A, <I>a) and (A, <I>a) with (sx, ex, xx) = 4)a(A) 
and (Sfj,, efj,, Xfj,} = 4)a(^), the a-scaled index distance Wq, : A x A —>■ [l,oo) is defined by 

I ^ 1 

uja{X, pi) = max > {1 + da{X, p,)), A G A, ^ G A, 

[Sf, SaJ 

where with sq = minjsA, 

da{X,p) = So“ I^A - Xfif + |{cis(eA,e^)}|^ -I- sq |(eA,a;A - x^)\. 

Let (A, <I>a) be a parametrization. Then the induced index distance on A is pseudosymmetric and 
satisfies a pseudo triangle inequality. More precisely, it has the following properties: 

(i) uja{X,X) = 1 for all A G A, 

(ii) Wa(A, A') X 0 Ja{X', A) for all A, A' G A, 

(iii) Wa(A, A') < a;o,(A, A")a;o,(A", A') for all A, A', A" G A. 

Hence the function Wq, can be viewed as a kind of multiplicative pseudo-metric. A proof of these properties 
for the 2-dimensional case can be found in [33], which translates very well to higher dimensions. 

Now we are in a position to formulate the main theorem of this paper. It states that the index distance 
- in a certain sense - measures the size of the scalar products of a-molecules. 
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Theorem 2.5. Let a € [0,1], d € N, d > 2, and let (rn\)x^\ and {Pf_t)fj.^A be two systems of d- 
dimensional a-molecules of order {L, Ni, N 2 ) with respect to parametrizations (A, $a) cLnd (A,<i>A), 
respectively. Further assume that there exists some constant c > 0 such that 

s\>c and s^>c for all X G A, p, € A with {s\,e\,x\) = = ^A^p)- 

If Ni > ^ and if there exists some positive integer N G N such that 


L > 27V, M >3N -d 


1 + a{d — 1) 
2 


Ni> N + 


1 + a{d — 1) 
2 


N2>2N + d-2, 


then we have 


\{'mx,PtJ.)\ <^aiX,p) 

The proof of Theorem l2.5l is very long and technical and for this reason not presented here but in Sec- 
tion|6| Let us instead discuss the significance of this result. It states that - with appropriate assumptions 
on the parametrizations - the cross-Gramian of two systems of a-molecules is well-localized, in the sense 
of a fast off-diagonal decay with respect to the index distance Wq. Put differently, the matrix is close 
to a diagonal matrix and the corresponding systems are almost orthogonal to each other. This property 
has many implications, see for instance [521 HI]. Its significance with respect to sparsity equivalence is 
elaborated in the next section. 


3 Sparse Approximation with ct-Molecules 

Based on Theorem 12.51 it is possible to develop a general methodology to categorize frames of a-molecules 
according to their sparse approximation behavior. A central concept in this context is the notion of 
sparsity equivalence. 

Another question that naturally arises in this context is if it is possible develop a theory of smoothness 
spaces associated with frames of a-molecules. For such an investigation, coorbit space theory provides an 
appropriate abstract framework. Usually however, due to the lack of group structure, this question can 
not be handled within the classical theory developed in nattziiii]- In subsequent contributions, among 
others uniiiiiii], the classical theory has seen significant extensions beyond the group setting. In fact, 
it is possible to base the theory solely on the notion of an abstract continuous frame, see [2ll|47l|46l|36]. 
In this general setup coorbits associated to frames of a-molecules can be defined and investigated. 

Up to now, this has only been carried out to some extent and not yet systematically, i.e., for certain 
special instances of a-molecules. A particular example are cone-adapted bivariate shearlet systems [44] . 
The authors believe that the development of a theory of a-molecule smoothness spaces is beyond the 
scope of this paper, but certainly a very interesting possibility for future research. In this paper, we 
concentrate on sparse approximations. 

3.1 Sparse Approximation and Sparsity Equivalence 

Let us briefly recall some aspects of approximation theory in a general (separable) Hilbert space {H, (•, •)). 
Utilizing a system {mx)x^A C H, a signal f G "H can be represented by the coefficients ca G C of the 
expansion 


/ = X! CATOA- (8) 

AgA 

Suitable representation systems are provided e.g. by so-called frame systems [6], which ensure stable 
measurement of the coefficients and also stable reconstruction. A system (rn\)\^A in H forms a frame if 
there exist constants A, B > 0, called the frame bounds, such that 

A||/f < ^ K/,mA)P < B\\ff for all f GH. 

AgA 
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If A and B can be chosen equal, the frame is called tight. In case A = B = 1, one speaks of a Parseval 
frame. The associated frame operator S : H ^ H is given by Sf = X]aga(/> 

Since S is always invertible, the system is also a frame, referred to as the canonical dual 

frame. It can be used to compute a particular sequence of coefficients in the expansion (|S]) via 

c\ = {f,S~^mx), A e A. 


This sequence however is usually not the only one possible. Unlike the expansion in a basis, a represen¬ 
tation with respect to a frame need not be unique. The canonical dual frame can also be used to express 
/ in terms of the frame coefficients {{f,rn\))\ by 

/ = '^{f,mx)S~^mx. 

AeA 


In general, any system {rhx)x&A satisfying this reconstruction formula is called an associated dual frame. 

Let us turn to the question of efficient encoding. In practice we have to restrict to finite expansions ([5|). 
which usually leads to an approximation error. Given a positive integer N, the best A^-term approximation 
/at of some element f € H with respect to the system {mx)x is defined by 


In 


argmm 


-E 

AgAjv 


cxmx 


s.t. 


#Ajv < N. 


For efficient approximation, it is desirable to find representation systems which provide good sparse 
approximation for the considered data, in the sense that the approximation error ||/ — /atH decays quickly 
for TV —>■ oo. Typically one wants to approximate signals in some subclass C Q 'H. The approximation 
performance of a system with respect to such a class is then usually judged by the worst-case scenario, 
i.e. the worst possible decay rate of the error ||/ — /tvH for / S C. In this sense a system (mA)A provides 
optimally sparse approximations with respect to C, if its worst-case approximation rates are the best 
among all systems. 

It is common to consider not the best A^-term approximation but the A^-term approximation, obtained 
by keeping the N largest coefficients. This approximation is better understood and provides a bound for 
the best A^-term approximation error. We will also denote it by /at, since the context will always make 
the meaning clear. 

The A^-term approximation rate achieved by a frame is closely related to the decay of the corresponding 
frame coefficients, often measured by a strong or weak £^’-(quasi-)norm with p > 0. The weak ^^-(quasi- 
)norm is defined by 

||(ca)a|U£p := (supe^’ • #{A : |ca| > e}) ^ , 

€>0 ' 

and by definition a sequence (ca)a £ ujP if ||(ca)aIL^p < oo. Since ||(ca)aIL^p < ||(ca)a|1p for every 
sequence (ca)a) we have the embedding £p ^ uji^. Note also that every non-increasing rearrangement 
(c* )„gN of a sequence (ca)a £ satisfies 

supn^/P|<| = ||(ca)a|U^p. 

n>0 

The well-known result below (see [2]), whose proof can be found e.g. in [28], shows that membership 
of the expansion coefficients in an space for small p > 0 implies good A^-term approximation rates. 

Lemma 3.1 ([28l Lemma 5.1]). Let {mx)xeA be a frame in H and f = '^cxmx an expansion of f € H 
with respect to this frame. If (ca)a £ for some p > 0, then the N-term approximation rate 

for f achieved by keeping the N largest coefficients is at least of order , i.e. 

\\f-fN\\l<N-P. 

In particular, the error of best N-term approximation decays at least with order . 
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As illustrated by Lemma 13.11 the decay rate of the frame coefficients determines the A^-term approxi¬ 
mation rate. In particular, if the sequence ((/, of frame coefficients lies in £p for p < 2, the best 

approximation rate of the dual frame is at least of order 

Let us now assume that we have two frames {mx)x£A and in the Hilbert space H and 

expansion coefficients for f G 'H with respect to these two systems. Then these frames provide the same 
A^-term approximation rate for /, if the corresponding expansion coefficients have similar decay, e.g. if 
they belong to the same f^-space. We recall [551 Proposition 5.2] and formulate this result in an abstract 
Hilbert space setting. 

Proposition 3.2 (|28l Proposition 5.2]). Let p G (0,2), and let {mx)xeA and {pfj,)fj,^A be frames in a 
Hilbert spaee H and (nix)xeA a dual frame for {mx)xeA such that 

< 00 - 

Then {(f,rhx))x £ ^^(A) implies {(f,PtJ.))u £ In particular, f G TL can be encoded by the N largest 

frame coefficients from {{f,p^f})^ up to accuracy < A^“(i/p“1/2) . 

This proposition motivates the following notion of sparsity equivalence initially introduced in [27j for 
parabolic molecules. 

Definition 3.3 f |271 Definition 4.2]). Letp £ (0,oo], and let {mx)xeA and {p^)^^a be frames in a Hilbert 
space TL. Then {mx)xGA and {p^)fi^A are sparsity equivalent in H, if 




iv^lv 


< 00 . 


Sparsity equivalence, as pointed out in [28] . is not an equivalence relation. Nevertheless, it allows to 
transfer approximation properties from one anchor system to other systems. 


3.2 Consistency of Parametrizations 

Our aim in this section is to categorize frames of a-molecules in L^(]R‘^) with respect to their approx¬ 
imation behavior, building upon the notion of sparsity equivalence. We emphasize that a system of 
a-molecules does not per se constitute a frame. In fact, the question if a system of functions in L^(]R‘^) 
is a frame is decoupled from the question if it forms a system of a-molecules. 

Theorem 13.71 will provide sufficient conditions for two frames of a-molecules to be sparsity equivalent, 
based upon the notion of (a, A:)-consistency originally introduced in [^. To motivate this concept, we 
recall a simple estimate from m for the operator norm of a matrix on discrete H spaces. 

Lemma 3.4 ([571 Lemma 4.4]). Let A, A be two discrete index sets, and let A : £.p{A) —>■ H{A), p > 0 
be a linear mapping defined by its matrix representation A = /^gA- "^b^en we have the bound 

{ \ l/min{l,p} 

sup^ |AA,p^“^^’P^sup^ > 

^ n ^ A J 

We apply this lemma to the cross-Gramian A of two systems of a-molecules, and aim for sufficient 
conditions for the right hand side to be finite. The following notion was introduced in [55] . 

Definition 3.5 ([551 Definition 5.5]). Let a G [0,1] and k > 0. Two parametrizations (A, ^a) and 
(A, $a) are called (a, fc)-consistent, if 

sup (jJa (A, p) ^ < oo and sup E OJa (A, p) ^ < oo. (9) 

Remark 3.6. Note that due to symmetry of the distance function uia it suffices to check only one of the 
two conditions in equation ([9|) to prove {a, k)-consistency. 
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In view of Theorein l2.51 the consistency of the parametrizations of two systems of a-molecules provides 
a convenient sufficient condition for their sparsity equivalence. 

Theorem 3.7. Let a G [0,1], d G N, d > 2, k > 0, and p G (0,oo]. Let {mx)\^A and be 

two frames of d-dimensional a-moleeules of order {L, M, Ni, N 2 ) with (a, k)-consistent parametrizations 
(A, $a) and (A, $a) satisfying 


s\ > c, Sfj, > c for all X G A, fi G A 


and with q := min{l,p} 


r r 7 1 + — 1 ) d Ar ^ l + Qffd—l) k , „ 

L>2-, M>3--d+ - Ni>-,Ni>- + -h- 1, A 2 > 2-+ d - 2. 

q q 2 2 q 2 q 

Then and are sparsity equivalent in £^. 

Proof. Let q min{l,p}. By Lemma l3.4l it suffices to prove that 


1/9 


max < sup E sup \{mx,Pf,)\ 


< 00 . 


AeA 


>eA '^^^AeA 

Since, by Theorem l2.51 we have |(mA,p^)| < uJa{X, , we can conclude that 


max < sup E \{'mx,Pn)V, sup Y l("^A,P/x)l'^ f ^ ™ax <1 sup Y aJa{X,p) sup ^ UJa{X, fj.) 


— k 


AeA 


AtGA 


AiGA 


AgA 


AgA 


MSA 


AiGA 


AgA 


with the expression on the right hand side being finite due to the (a, fc)-consistency of the parametri¬ 
zations (A, $a) and (A, $a)- The proof is completed. □ 


This theorem allows to categorize frames of a-molecules according to their sparse approximation 
behavior. The general strategy is as follows. If an approximation result for a specific system of a- 
molecules is known and a class of a-molecules satisfies the hypotheses of Theorem 13.71 i.e. they are all 
sparsity equivalent to this specific system, they automatically inherit its known approximation behavior. 
In this way, one in the end obtains a stand-alone result for frames of a-molecules to exhibit sparse 
approximation, depending solely on the parametrization and the order. 


4 Sparse Approximation of Video Data 

In this section we demonstrate with a specific example how the machinery of a-molecules can be applied 
in practice. In our exemplary application, we are interested in the sparse representation of video signals, 
modelled by the class of cartoon-like functions introduced below. 

Following the general methodology, we first need a suitable anchor system, for which a sparse approx¬ 
imation result with respect to is known. Utilizing Theorem 13.71 the framework can then transfer 

the approximation rate from this reference system to other systems. In this way we will be able to identify 
a large class of representation systems, which provide almost optimal sparse approximation for 

4.1 Cartoon-like Functions 

A suitable model for image and video data is provided by the class of cartoon-like functions, first in¬ 
troduced by Donoho [15] and later extended e.g. in [IT]. We shall use the following simplified model in 
dimensions d = 2 and d = 3. 

Definition 4.1 f[15j.[41l Definition 2.1]). For fixed iz > 0 and d G {2, 3} the class of cartoon-like 

functions consists of funetions f : C of the form 

f = fo + flXB, 
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where B C [0, and fi G C'^(R‘^) with supp fi C [0,1]^^ and ||/i|| (72 < 1 for each i = 0,1. For dimension 
d = 2, we assume that the boundary dB is a closed C"^-curve with curvature bounded by v, and, for d = 3, 
the discontinuity dB shall be a closed -surface with principal curvatures bounded by v. 

This model is justified by the observation that real-life images and video data typically consist of 
smooth regions, separated by piecewise smooth boundaries. Without loss of generality we restrict in 
Definition 14.II to cartoon-like functions with smooth boundaries. 

4.2 Optimal Approximation for Cartoon-like Fnnctions 

In [HIIIT] the optimal approximation rates for and achievable with algorithms satisfying 

a polynomial depth search constraint, were derived. An algorithm for sparse approximation is thereby 
restricted to polynomial depth search in a given (countable) dictionary, if there exists a polynomial tt 
such that the algorithm only chooses from the first ^{N) vectors of the dictionary when forming the 
A^:th sparse approximation m- Such a constraint is very natural from a practical standpoint and has 
the nice side-effect that it ensures the existence of the best A^-term approximation. In general, this may 
not be the case, as the instance of a countable dense subset of as a dictionary shows. Here the 

best 1 -term approximation does not always exist. 

We now cite the result of [miiT]. In [41] it is conjectured that the result also generalizes to higher 
dimensions. 

Theorem 4.2 ([151 Theorem 7.2],[4T1 Theorem 3.2]). Let d € {2,3}. The best N-term approximation 
rate for ^^(K^^), achieved by an arbitrary dictionary under the restriction of polynomial depth search, 
cannot exceed 

where fjsi is the best N-term approximation of f G f^(]R‘^). 

For d = 2 it has been shown that this rate can indeed be achieved [15] using so-called wedgelets, 
which are adaptive to the data. Moreover, there are several examples of non-adaptive frames in two and 
three dimensions which almost provide these optimal rates [5J[3I1IM1I31I1I], typically up to log-terms. 
In particular, it was proven by Guo and Labate in |33] that the smooth Parseval frame of band-limited 
3D-shearlets SB constructed by them in [M] sparsely approximates the class f^(K^) with an almost 
optimal approximation rate. We recall the definition of this particular system in Subsection 15.3.11 and 
state the approximation result below. 

Theorem 4.3 ([33] Theorem 3.1]). Let SH = {tpx}\£AsH ^he smooth Parseval frame of SD-shearlets 
defined in Subsection \ 5.S.1[ Then the sequence of shearlet coefficients 0\{f) := {f,ip\), A G h.sH, 
associated with f G £^(]R^) satisfies 


sup \Bx{f)\N<N-^-log{N), 

where \0\{f)\N denotes the N:th largest shearlet coefficient. 

Theorem 14.31 shows that the shearlet coefficients belong to ojB’{Ksh) for every p > 1. In view of 
Lemma O for every / G £i^(M^), the frame SH therefore provides at least the approximation rate 

ll/-/iv|| 2 <^“'+" ,£>0 arbitrary, ( 10 ) 

where /at denotes the A-term approximation obtained from the A largest coefficients. According to 
Theorem 14.21 this is almost the optimal approximation rate achievable for cartoon-like functions £'^(K^). 
For small £ > 0, we get arbitrarily close to the optimal rate. 

The idea to use 3D-shearlets for processing video data has been tested in practice in [5^. The 
authors of said article develop a clever discretisation procedure, allowing a fast computation of the 
shearlet coefficients. They then test how their discrete shearlet transform can be used to denoise and 
enhance video sequences, with promising results. 
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4.3 Transfer of the Approximation Rate 

Our final goal is to find a large class of representation systems which achieve the almost optimal rate OT. 
For this, we put the machinery of a-molecules to work. Via Theorem 13.71 it is possible to transfer the 
approximation rate (US established for the smooth Parseval frame of 3D-shearlets SH to other systems 
of 3-dimensional ^-molecules. In fact, the frame SH is a suitable choice for the reference system, since by 
Proposition 15.91 it constitutes a system of 3-dimensional ^-molecules of order (oo, oo, oo, oo) with respect 
to the parametrization {Ash, ^sh) Tbe following result is then a direct application of the general theory. 

Theorem 4.4. Assume that a frame {Tn\)x^A of 3-dimensional ^-molecules satisfies, for some k > 0, 
the following two conditions: 

(i) its parametrization (A, $a) is (^,k)-consistent with {Ash,^sh), 

(ii) its order {L, M, Ni, N 2 ) satisfies 

L>2k, M>3fc-2, Ni>k + 1, iVi > 3/2, N2>2k + 1. 

Then each dual frame {rh\)\,^A possesses an almost optimal N-term approximation rate for the class of 
cartoon-like functions i.e. for all f € 

11/ - InWI < e > 0 arbitrary, 

where /jv denotes the N-term approximation obtained from the N largest frame coefficients. 

Proof. Let SH = {t/AjAeAsH be the Parseval frame of 3D-shearlets from Subsection 15.3.11 and take 
/ G By Theorem 14.31 the sequence of shearlet coefficients (6*a)a given by 9\ = (/, V'a) belongs to 

ujP‘{Ash) for every p > 1. Since ujP ^ for arbitrary e > 0, this further implies {9\)\ G £p{Ash) 
for every p > 1. Let now 

f=J2 

ueA 

be the canonical expansion of / with respect to the dual frame with frame coefficients {Cffjfi. Note 

that since SH is a Parseval frame, the canonical dual frame of SH is equal to SH itself. Therefore the 
coefRcents are given by 

c-ti = if, rriff) = ( X! 

A A 

Thus, they are related to the shearlet coefficients {0\)\ by the cross-Gramian By Theo¬ 

rem [T71 conditions (i) and (ii) guarantee that the frame {m^)^^A is sparsity equivalent to {ipx)x^AsH 
in for every p > 1. This implies that the cross-Gramian is a bounded operator P’{Ash) £A{A), 
which maps {0x)\ to {Cff)^. Hence, (c^)^ G P’{A) for every p > 1. The embedding PP ^ uj£p then proves 
{Cf_i)fi G uj£P{A) for every p > 1. Finally, for arbitrary e > 0, the application of Lemma ITT] yields 

ll/-/Av||^<iV-'+^ 

where /at denotes the iV-term approximation with respect to the system (m^)^ obtained by choosing the 
N largest coefficients. □ 

Theorem 14.41 specifies a large class of multiscale systems with almost optimal approximation perfor¬ 
mance for video data According to Remark |5.11l condition (i) is in particular fulfilled by every 

^shearlet parametrization (see Theorem lOl) for k > 3. Hence, due to condition (ii) all systems of 
3-dimensional ^shearlet molecules of order 

L>7, M>8, iVi > 5, iVa > 8, 

provide almost optimal approximation for £1^(IR.^). 

Taking into account Proposition l5.12l the statement of Theorem l4.4l in particular includes the following 
result for compactly supported shearlet frames. 
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Corollary 4.5. Any dual frame of a shearlet frame of the form (j27|l generated by compactly supported 
functions so that (p £ C^^(]R^) and so that for each s £ {1,2,3} 

(i) d'^ip’^ exists and is continuous for every 7 = ( 71 , 72 , 73 )^ £ Ng with I 7 I 00 < 13 and 7 ^ < 5, 

(ii) ip^ has at least 15 vanishing directional moments in direction e^, 

provides the almost optimal approximation rate cni) for the cartoon video class 

This corollary is a new result for compactly supported shearlets on its own. A similar result was 
proved in |41j . In comparison the most intriguing fact is the simplicity of its deduction: The framework 
of a-molecules enables a simple transport of the decay rates. 

Remark 4.6. The 2-dimensional counterpart of Theorem \4-4\ is contained in J281 Theorem 5.12] for 
the choice a = i. In ’[28] it is further shown that not only any system of bivariate I-shearlet molecules 
satisfies the conditions of this theorem, provided that the order is sufficiently high, hut also every system 
of sufficiently high order I-curvelet molecules. This implies that also curvelet-like constructions yield 
almost optimal approximation for the class including the classical curvelet frame. 

In contrast, a ‘true’ curvelet construction in 3D is not known to the authors. Note that, despite the 
misleading name, the construction of the ‘3D discrete curvelet transform ’ in is actually shear-based. 
Still, it can be expected that any curvelet-like system would fall into our framework, and thus Theorem ]].]] 
would immediately establish the almost optimal approximation rate. 

5 Shearlet Systems in d Dimensions 

We introduce a very general class of shear-based systems, namely systems of a-shearlet molecules. The 
definition in d dimensions is analogue to the 2-dimensional case [55]. Roughly speaking, they are shear- 
based systems obtained from variable generators, where similar to a-molecules the conditions on the 
generators have been relaxed to a mere time-frequency localization requirement. The notion of a-shearlet 
molecules comprises many specific shear-based constructions and simplifies the treatment of such systems 
within the general framework of a-molecules. 

5.1 Multidimensional o-Shearlet Molecules 

As explained in Section (T] shearlet-like constructions are based on anisotropic scaling, shearings, and 
translations. For the change of scale, we utilize a-scaling as defined by ([S]). The change of orientation is 
provided by shearings, in d dimensions given by the shearing matrices 

1 ) 1 )’ 

which are the natural generalizations of ([5]). The matrix shears parallel to the (ei,... ,ed_i)-plane 
and the shear vector h £ determines the direction of the shearing in this plane. The transformations 
associated with shearings and a-scalings naturally form a group |9]. 

To avoid directional bias, the frequency domain is divided into cone-like regions along the coordinate 
axes and a coarse-scale box for the low frequencies. Note that this comes at the cost of losing the group 
properties mentioned above. This division procedure is however crucial for applications, and also, as the 
subsequent arguments will show, for including a-shearlets in the concept of a-molecules. The cone-like 
regions along the e^-axes shall be called pyramids and are explicitly given by 

= £ K'" I Vz £ (1,..., dj : led < lesl} , 

where e £ ( 1 ,..., d}. e = 0 shall refer to a coarse-scale box of the form 7^ = (^ £ : |e|oo < C}, where 

C > 0 is a suitably chosen constant. In the sequel we will always stay in this so-called cone-adapted 
setting. For an illustration of this specific setting in 3D, we refer to Subsection 15.31 
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In each cone we require different versions of the scaling and shearing operators. The cyclic permutation 
matrix 


Z = 



( 11 ) 


allows to elegantly define these operators associated with the respective cones by Z^ShZ~^ and Z‘^Aa,sZ~^ 

Before we come to the definition of a-shearlet molecules, we need to introduce a set of characteristic pa¬ 
rameters, associated with these systems. The resolution of the underlying sampling grid is determined by 
the parameters cr > 1, ti, ..., > 0, and a sequence 0 = (»7j)jeNo C M+. The parameter a specifies the 

fineness of the scale sampling. The parameters r^, e G determine the spatial resolution in the 

Cg-direction. For convenience they are summarized in the diagonal matrix T := diag(ri,..., r^;) G 
The angular resolution at each scale j G No is given by the value rjj of the sequence 0. Last but not 
least, in each cone £ S {1,..., d} and at each scale j G No the shearing parameter t is restricted to a set 
A£e ,3 C . These sets are collected in := ■ £ G {1,... ,d}, j G Nq}. 

After the introduction of this sampling data ID := {a, 0,^,T} we can now give the definition of a 
system of a-shearlet molecules in d dimensions, depending on D. The scale-dependent step size r]j of the 
directional sampling is assumed to satisfy rjj x for j G No. Further, we require the upper bounds 

Lj := max { \^\oo ■ ^ G Afe,j,£ G {1,..., d}}, j G No, to fulfill the complementary condition Lj < 

We remark that the translation parameters may also vary with the indices (£,j, £), as long as their 
values are restricted to some fixed interval [TmimTmax] with 0 < < Tmax < oo. However, this is not 

indicated in the notation. 


Definition 5.1. Let a G [0,1], d G N, d > 2, and L, M, fVi, A 2 G No U { 00 }. Further the sampling data 
ID) shall be given as above. For e G {1,... ,d}, a system of functions 

^e--={ml,^,GL^iR‘^) : (j,£, A:) G A*}, 

indexed by the set := : j G Nq, £ G C k G is called a system of d-dimensional 

a-shearlet molecules of order (L, M, Ni,N 2 ) associated with the orientation e, if it is of the form 




( 12 ) 


with generating functions ^ f. G satisfying for every p G Nq with \p\i < L 




< 


iin{l,a-^' + \[Z-^i]d\} 


M 


(13) 


The implicit constant is required to be uniform over A^. If one of the parameters L, M, Ni, N 2 takes the 
value 00 , this shall mean that condition (USD is fulfilled with the respective quantity arbitrarily large. 

Combining systems of a-shearlet molecules of order {L, M, Ni,N 2 ) for each orientation £ G {1,..., d} 
with a system of coarse-scale elements 


So := 


■■={<o,k--=lo,o,k{--rk) : kGZ^}, 


(14) 


where the generators jgo.k ^ L^(R‘^) fulfill |i9'’7o,o.fc(?)l ^ (1^1) ^^(|?|[d-i]) for every p G Nq with 
|p|i ^ L, yields a system of a-shearlet molecules of order {L, M, Ni, N2). The associated index set is 

Ag := {(0,0, fc) : fc G Z^} C No X Z'^-i x Z'^. 

Definition 5.2. ForeacheG {l,...,d}, letY,g be a system of a-shearlet molecules of order {L, M, Ni, N 2 ) 
associated with the respective orientation. Further, let Eo be a system of coarse-scale scale elements de¬ 
fined as in m- Then the union 

d 

s := U 

£=0 
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is called a system of a-shearlet molecules of order {L, M, Ni, N 2 ). The associated shearlet index set is 
given by 


A® = {{e,j,e,k) : e e {0,... ,d}, {j,e,k) e A®}. 

Next, we prove that the system S is a system of a-molecules. 

Theorem 5.3. Let a G [0,1] and T, be a system of a-shearlet molecules of order {L,M,Ni^N 2 ). Then 
E constitutes a system of a-molecules of the same order. The associated a-shearlet parametrization 
(A®,<i)®) is given by the map $®(A) = {s\,e\,xx) G for X = {e,j,£,k) G A®, where 

sx = a^, ex=nx-Z^(^^fy xx = Si^]A-^„Z-^Tk, (15) 

and = (1 -I- rj^ « normalization constant. 

In particular for e = 0 we have sx = 1, ex = ej, and xx = Tk for every A = (0,0, 0, fc) G A®. 

Proof. Since a finite union of systems of a-molecules is itself a system of a-molecules, we can prove this 
theorem separately for each system E^, e G {0,..., d}. For Eq the statement is obvious. For the other 
systems it suffices to give the proof for e = d, since they are all related by a mere permutation of indices. 
We subsequently drop the index e to simplify the notation and note Z^ = I for e = d. 

For the proof we introduce the index set A®’^^ := {{d,j,£,k) : {j,i,k) G A^}. Let A = {d,j,i,k) G A®’^^ 
and mx ■= nij^ ^ the associated a-shearlet molecule with corresponding generating function jx ■= Jji k- 
As usual we denote the angles representing the orientation ex by {9x,<px), i.e. ex = R^^Rj^ed- The 
molecule mx can clearly be written in the form (jS]) with respect to the generator 

g\{x) := 'yx{Ai ,^Ser,,Rl^Rj^A-^^x), x G M.^. 

It remains to check condition 0 for these functions. On the Fourier side we have 

5a(0 = %iA-^.Sf^^Rl,RlAi^J), f G M". 

For A = {d,j,£, k) G A®’'^ let us first examine the matrix 

Mx:=Sf-^^Rl^Rl. (16) 

A simple calculation shows MxCd = R'^^Rj^Cd = Sf^ex = Sf^^nx{rjj£,l)^ = nxCd- Hence, the 
entries of the last column of Mx vanish except for the last one. Next, we prove the uniform boundedness 
of the set of operators {Mx}xgA^-‘‘- H holds uniformly for A G A®’'^ 

\\Mxh^2 = \\sf;^h^2 = ^ 1- 

Note that this implies that each entry in Mx is bounded in modulus. Since similar considerations hold 
for the inverse Mf^^ = Re^Rip^Sj^.^ we can conclude that both Mx := AM^MxAf,^^ and its inverse Mff^ 
have the form 

/* ... * 0\ 

* ... * 0 

yn ... □ 

where the entries * are the same as in Mx (or Mff^) and the entries □ are of the form a~A^~‘A[Mxei]d 
(or (T“Ai-«) for i g {1,..., d — 1}. In particular, the entries of Mx and Mf^^ are uniformly 

bounded in modulus. This implies ||Ma|| 2-).2 ^ 1 and ||M^^|| 2-).2 ^ 1- Altogether, we obtain 

\Mxf\ X 1^1 uniformly for ^ G and A G A®’®*. (17) 
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Due to the structure of the last column of Mx we further have for ^ = (^i,..., G 

IMaCIm-I] = \Mx{^l, ■ . . ,C<i-l,0r|[d_i] < ||Ma|| 2 ^ 2 |(Ci, • ■ • ,ed-i, 0)^1 = ||MA|| 2 ^ 2 |^|[d-l]. 

For the inverse M~^ it holds analogously < ||M;^^|| 2->.2 |^|[d_i]- We conclude 

|MAC|[d-i] - ICI[d-i] uniformly for ^ G and A £ (18) 

Finally, the following estimate holds uniformly for = (^i,..., S R'^ and A G 

\[Mx^U < |[MA]^||Cd| ^ I^Im-1] + ■ (19) 

Finally, we can prove © for every p G Nq with \p\i < L, 

min{l,g~-^ + o'~^^~“^-^|AfAg|[d-i] + \[Mx^]d\}^ 
{\Mx^\)^H\Mx^\ld-i])^^ 
min{l,cr-J' +cr-(l-“)j' |C|[d_i] + IK]d|}^ 

(ier^(i?i[d-i])"^= ■ 

The first estimate holds true, since gx{^) = jx{M\^) and the entries of Mx are uniformly bounded in A. 
The second estimate is due to m- For the last estimate we used (HZD, HID, and m- The observation 
Sx = <7^ finishes the proof. □ 

5.2 Consistency of the Shearlet Parametrizations 

The properties of a-molecules depend essentially on their parametrizations. In view of Theorem 13.71 the 
consistency is of particular interest when investigating approximation properties. In this paragraph we 
shall prove, in Proposition [521 that shearlet parametrizations are consistent with each other. This allows 
to establish approximation rates for various shearlet-like constructions simultaneously, as long as they 
fall under the umbrella of the shearlet-molecule concept. 

Lemma 5.4. Consider the gnomonic projection (p : R‘^\ {cc G R'^ | [a;]d = O} —>■ R‘^,a: i—>■ and let 

1 > c > 0 be fixed. For v,w G fl {x G R'^ : [x\d > c} we then have \(j){v) — (j){w)\ x |ti — wj and 
l^'-w|[d-l] - lu-w]. 

Proof. First note that |u — w|[d_i] = k(i^) “ 7r(u>)|, where tt is the orthogonal projection of R^^ onto the 
(ei,..., ed-i)-plane. On the set 0 {a; G R^^ : [a::]d > c}, the mappings p and tt are diffeomorphisms 
with bounded derivatives in both directions. This implies the statement. □ 

In order to apply the previous lemma it is useful to record the following observation. 

Lemma 5.5. Let 1 > c > 0 be fixed and let w = (wi,... ,Wd)'^ G be a vector such that Wd < c. 
Then there exists a point w = {wi ,..., Wd)'^ G with Wd P c such that 

|w — u| < Ire — u| for every v G 0 {a; G R'^ : [a;]d > c} . 

Proof. If Wd < —c simply take w to be the reflection of w at the (ci,..., ed-i)-plane. Then we have 
Wd > c > 0 and we can conclude for every v = (vi,..., Vd)'^ G with Vd > c > 0 

jv - Wp = |u - w|[d_i] + |ud - Wdp =lv- wf[d-l] + kd + >\v- wf[d-l] + \vd “ ^d P = \v - w\^. 

In the other case jwdl < c we argue as follows. Applying a rotation about the Cd-axis, we may assume 
that w is of the form [-^1 — w'^, 0,... 0, Wd]'^. The vector w = [x/1 — c? , 0,..., 0, c]^ then has the desired 
properties. To verify this it suffices to show (w,v) > {w,v), because then 

Iw — r!|^ = |ii;|^ + |up — 2{w, v) = 2 — 2{w, v) < |u>|^ + |u|^ — 2{w, v) = |ic — 




< 


sup 

I /Ql_ r 


{d^fx){MxO 
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In order to show 0 < {w — w,v) = vi{\/\ — — y^l — w^) + Vdic — Wd), we first observe that 

VT^ < xA" — w'^ since Iwdl < c. Moreover, every v G ^ fl {a; G : [x\d > c} satisfies |r)i| < 
— v^. It follows 

- ^I - +Vd{c-Wd) > \JI - 1 - +Vd{c-Wd). 

Hence, it just remains to prove the inequality -^/l — (\/I — — -\/l — + Vd{c — Wd) > 0 under 

the condition 0 < |wd| < c < < 1. The associated angles 9yj,9i.,0y G [0, tt] defined by cos0u, = Wd, 

cos 9c = c, and cos 9y = Vd, satisfy tt > 9yj > 9c > 9y > 0, and the inequality reads as 

cos(9c — 9y) = ((sin0„, cos0„), (sin0c, cos0c)) > ((sin0„,cos0„), (sin0u,,cos0u,)) = cos(9yy — 9y). 

This however is obviously true since Q < 9c — 9y < 9y, — 9y < tt. □ 

After this preparation we are in the position to prove the consistency. 

Proposition 5.6. Let a G [0,1] and assume that (A, $a) cind (A, $a) o-re a-shearlet parametrizations 
(possibly with dijferent parameters). Then (A, $a) and (A, $a) are {a, k)-consistent for every k > d. 

Proof. As already was noted in Remark 13.61 it suffices to prove that for A > d it holds 

sup fT)~^ < 00. 

For this task it is convenient to decompose the shearlet index set A = Ag U • • • U A^ into the sets A^ 
associated with the respective pyramidal regions Vg for e G {1,..., d} and the low-frequency box TZ for 
£ = 0. The sum then splits accordingly into d -I- 1 parts, which we handle separately below. 

Aq: Let p G A and A = (0,0, 0, k) G Ag with k G The shearlet parametrization (ITKI) yields s\ = 1, 
ca = Cd, and x\ = Tk. Furthermore, > 1 for all p G A. Hence we have 

LOa{\,p) = v(l + \'Tk - + |{ds(ed,e^)}|^ -I- \{ed,Tk - a;^)|) > s^(l -I- \Tk - 

We conclude 


E < E ^ E (1 + 

AgAq k^'L^ 

where for A > d/2 the sum on the right converges. 

Ac, £ G {1,..., d}: We only deal with the special case £ = d, since the other cases can be transformed 
to this case via rotations. Let p G A and write with j' G M. In view of sa = cr-l for 

A = (d, j, £, k) G Ad we then have 

E Wa(A,A.)-^= E (l + rfa(A,d))-^. 

xaAd leNo 

sx=a3 

If we can prove that 

5:= E {l + dc.{X,p)r^ (20) 

s\—(yd 

independently of j G Ng and p G A, we are done, since ct > 1, , max{sA/s^, s^/sa} = I 

and thus if A > d 

E < E < 2 E = Tv|i=» < ”■ 

AeAd jeNo iGNo 
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Putting in the definition of da(A,^) and abbreviating jo := min{j, the snm S becomes 

, ^ -N 

5= ^ + (7^“'^° I^A - |{fis(eA,e^)}|^ + cr-^” |(eA,a;A - 2;^)!) ■ (21) 

s\=(y^ 

In order to prove the estimate (1201) for S, we first study the different terms of the summand indepen¬ 
dently. Let A = {d,j,£, k) £ and recall the matrix Mx from (fTHll . It holds 

Ml = 

and - according to the discussion of Mx in the proof of Theorem l5.3l - its last row is given by (0,, 0, nx) 
with nx = i^ + rjj Since r]j x and \£\2 ;< Lj < this implies tt-a x 1 uniformly for 

all A £ £^d- 

As a direct consequence [Mlx\d = nx[x\d x [a:]d uniformly for A £ Ad and a; £ In addition, 
we have \Mlx\ x |a;| uniformly for X G Ad and a: £ since ||MJ|| 2->2 = ||AfA|| 2->2 ^ 1 and also 

||M-^||2^2 = ||M-1||2^2<1. 

These observations allow the following estimate, 


\xx - Xfj,] = 


SilA-lTk-x, 


Re^Rip)^S^^,A^i'Tk Rg^R^p^Xf^ 

= \Ml{TA-^k - M-^Re^R^^Xp) \ x {A-^k - T~^M-^Re^Rp^Xf,\ 

> \A-lk-r-^S,,^x,\^^_^^ = 

In view of ex = R^^Rj^ed we also have the estimate 

\{fix,xx ~ Xfj)\ = {bxt S^^,A^iT~k — Xp) = {ed,R0)^Rp)^Sf^^,A^i'Tk — Rg^^Rp^^xl) 
= \{ed,Ml{rA-lk-M-^Rg,Rp,x ^))\x |(ed, 

= |(ed,cr'-’fc - T~^SlrtjXfj)\ . 


( 22 ) 


(23) 


According to the shearlet parametrization (1151) we have ca = nx{£r]j,l)’^i where ua 1 as shown 
above. Hence, there is a constant c > 0 such that nx > c for all A £ A^. It follows [eA]d > c > 0 for 
all A £ Ad- Without loss of generality we can further assume that [e^]d > 0 since |{ds(eA, — e^)}| = 
|{d§(eA,e^)}|. 

In this situation Lemma [6l3] applies and tells us that |{c?s(eA, e^)}| x |eA — e^|. Moreover, possibly 
after changing to as in Lemma [V51 to enforce [e^^Jd > c, we have |eA — e^| > \ex — e/j_\ for all A £ A^. 

Let (j) be the gnomonic projection from Lemma [5.41 Then (j){ex) = S R'^ and Lemma [5.41 

together with the observation \4>{v) — 'p{'^)\id-i] — \4’{'^) ~ 4‘i'u^)\ implies the estimate, 

|eA -ep\- Klrjj, if - = |(%, 1)^ - feff-i]- 

Subsequently, let £ R^^”^ be defined by fef = fp, 1)^. Altogether we arrive at 

|{ds(eA,e;,)}| > \{er]j,lf - fef\[d-i] = (24) 


We now use (l22l) . (|M1) and (|M)) to estimate the sum S in eiD- Introducing the quantities qi(£) '■= 
SirijXfi, q 2 {£) ■= SirfjXfj,, and <73 := r]fiyp, and taking into account rjj x we 

obtain 


fceza 


Next, we distinguish the cases j > j' and j < j'. For j < j' we have jo = j and we obtain 

, V -N 

- E E f + \’^-di{£)\[d-i] + \fd,k-q2{e))\ + \i-q3\f 

fceza ^GZ^-i 

, . -N 

~E E (1 + i^iEi] i^Im-1]) < 00 . 

fceza fGZ^-i 
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In case j > j' it holds jo = f and p := j' — j < 0. The term a'^^S is - up to a multiplicative constant 
bounded by 


^GZ‘i-1 feGZ'i 

The last sum can be interpreted as a Riemann sum, which is bounded - up to a multiplicative constant 
independent of j and j' as long as j > j' - by the corresponding integral 

/ / (l + |a::-CT“^<?i(2/)l[d_i] + l(ed,a;-CTPg 2 ( 2 /))| + ly- cr^^"“N3p) dxdy. 

t / \ 

All in all we end up with S < (^1 + |a:|^d_i] + |(ed,a:)| + \y?j dxdy. To see that 

the integral converges for N > d, we carry out the integration over Xd, which yields up to a fixed constant 





(1 


I5l[d-1] 


dxdy 


GR2(<i-i) 


(1 + kP) 


(AT-l) 


The integral on the right converges precisely for N > d. This observation concludes the proof. 


□ 


5.3 Pyramid-adapted Shear let Systems in 3D 

In the sequel we focus on some concrete shearlet systems in 3 dimensions, which are already on the 
market and for which we verify that they are instances of a-shearlet molecules for a = ^ and thus by 
Theorem 15.31 also systems of 3-dimensional parabolic molecules. 

Let us first recall the classic definition [39] of a shearlet system in L^(]R^). A system of 3D-shearlets 
is defined as a collection of functions in L^(]R^) of the form 

{iA,.^.fc = 2V(5M| 2--/c) : j eZ,eeZ'^,keZ^Y (25) 

where ip G is a suitable generating function. The classical choice for the generator is a function 

Ip defined on the frequency domain by 


m = e = (a,6,6)^eK3, (26) 

where v G is a bump function and w G C^{M.) the Fourier transform of a suitable univariate 

discrete wavelet. It is possible to choose v and w so that (l25|l becomes a Parseval frame for L^(]R^). 

Unfortunately, the shearlet system (ESI) has a directional bias due to the fact, that for large shearings 
the frequency support of the elements becomes more and more elongated along the (ei, e 2 )-plane. This 
directional bias affects negatively the approximation properties of (1251) and makes this system impractical 
in most applications. 

To avoid this problem, the Fourier domain is usually partitioned into three pyramidal regions 

^1 = {(6,6,6) eR' : |||<i,|||<i}, 

7^2 = {(6,6,6) eR' : l|l<i,l|l<i}, 

P3 = {(6,6,6) eR' : l|l<i,l|l<i}, 

and for each pyramid a separate shearlet system is used. Then, since each system only has to cover one 
pyramid, the shear parameters can be restricted avoiding large shears. To take care of low frequencies, it 
is common to use distinguished coarse-scale elements with frequencies in a centered box. Here this will 
be the cube 

7^={^eR3 ^ 1^1^ <i}. 
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This cube together with the truncated pyramids Vi := 'Pi\'R-, V 2 '■= 'P 2 \R-, and := V^XR- partitions 
the Fourier domain into 4 regions. 

With each of these regions, different operators are associated. The coarse-scale functions are only 
translated, in the other regions we also scale and shear. The scaling and shearing operators associated 
with the respective regions e G {1, 2, 3} are given by and = Z^ShZ~^, and take 

the concrete form 



/s 

0 



/S2 

0 

0 \ 


1 

S2 


II 

0 

s 


Vo 

0 

S2 / 


Vo 

0 

S2 / 



/s5 0 0\ 

0 S5 0 , 

Vo Os/ 


for s > 0 , and for h G 

/l hi /i2\ /I 0 0\ /l 0 0\ 

= 0 1 0 , Sl^^=ih 2 1 hi , 43) ^ ^ Q ^ Q 

yo 0 1 / V 0 0 1 / V^i ^2 1 / 


Now we are ready to define a modified shearlet system, which is adapted to our partition of the Fourier 
domain and therefore called cone-adapted. These systems do not exhibit the directional bias as (12511 . In 
the 3D cone-adapted setting they are called pyramid-adapted 3D shearlet systems. 

Definition 5.7 ([41] |42]). For fixed ti,T 2 > 0 let T = diag(ri, T 2 , T 2 ) G The (affine) pyramid- 

adapted 3D shearlet system generated by the functions G and G L^(R^), e G {1,2,3}, is 

defined as the union 

:= $(<();ti) U 4'i(' 0V n, r 2 ) U n, r 2 ) U n, T 2 ) (27) 

of the coarse-scale functions $(</>; ti) := {4>k = 4>i‘ — Tik) : k G and the functions 

^,(V.^;Ti,r2) := = 2^r{Z^SiA{^^Z-^ ■ -Z^TZ-^k) : j G No,£ G Z^, \t\^ < \2^/X,k& Z^} 

associated with the pyramids Ve for e G (1, 2, 3}. 

These pyramid-adapted affine systems are the prime examples of ^-shearlet-molecules. In practice, 
one wants them to be frames, especially tight frames. However, the construction of tight frames of 
pyramid-adapted shearlets is not trivial. 

The simplest way to obtain a Parseval frame of pyramid-adapted shearlets starts with a Parseval 
shearlet frame of the type ([25]), which is easier to construct. It yields a shearlet system associated with 
the pyramid Vs by removing all elements, whose frequency support does not intersect V 3 . Truncating the 
remaining functions in the frequency domain outside of P 3 , one obtains a Parseval frame for the space 

LX'Psy ■■= {/ e l2(m3) . s^pp f ^ 


A similar procedure yields Parseval frames associated with the the other parts of the Fourier domain, 
namely for , e G (1, 2}, and . The union of these frames then is a Parseval frame for the 

whole space L^(R^). 

This approach has the drawback that it leads to bad spatial localization of the shearlets due to 
their lack of smoothness in the frequency domain, which is a consequence of the truncation. A different 
approach was taken by Candes, Demanet, and Ying in [33]. They gave up on the affine structure of 
the system and found a shearlet-like construction of a Parseval frame. Guo and Labate modified this 
approach |331134] and found another shearlet-type construction, which is even close to affine. 


5.3.1 Bandlimited Tight Frames of Pyramid-adapted Shearlets 

The construction we present here is the one due to Guo and Labate [531 [31] ■ It starts with a Meyer 
scaling function G =5^(R) satisfying 0 < (^ < 1, supp ^ C [—|, |], and = 1 on [— 3 ^, which is used 
to define <I> G oS^(M^) on the Fourier side by 

$(0 := kX)ki 2 )m), f G R^ (28) 
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Then the function 




is defined, with supp W C [— 5 , 5]^\(—fT = 1 on such that + 

= 1-for every ^ S R^. These functions thus produce a smooth tiling of the frequency 
domain into cartesian coronae. To take care of the directional scaling, a bump-like function v € (K) 

is used, which satisfies supp v C [—1,1] and 

\v{t - 1)1^ -I- |n(t)|^ -I- \v{t + 1)1^ = 1 for t e [-1,1], 
u(0) = 1 and u*-"^(0) = 0 for n > 1. 

For an explicit construction of such a function we refer to [31]. Then V G C°°(]R^) is defined by 

no = ^(|M|), e = (el,6,e3^eR^ 

with support fully contained in the pyramid Vs- After this preparation the smooth Parseval frame of 
band-limited 3D-shearlets introduced by Guo and Labate in [34] can be defined. 

The coarse-scale functions, which take care of the low frequencies in TZ, are translates of the function 
$ from (1^ . 

SHo:={fjlo,ki^) = H--k) : keZ^}. (29) 


The shearlets, whose frequency support is fully contained in the respective pyramids, are called interior 
shearlets. They are defined as the collection of functions 

' (^ 5 1? ^ ^int\ 

indexed by h-int '■= {[e,j,(.,k) G {1,2,3} x Nq x x : \t\oo < 2^} and with Fourier transforms 

exp(-2m(Z^Si^A7^^Z-^C k)), ^ G (30) 

where W and V are the auxiliary functions from above. Observe that 

= z;(2^| - 4)u(2^| - 4), 

and compare this to Finally, the so-called boundary shearlets, obtained by carefully glueing together 
shearlets from adjacent pyramidal regions, are added to obtain a smooth well-localized frame. The glueing 
process is rather delicate, since it is important to select the right shearlets matching together. In fact, 
even in the original construction [34] there are some inaccuracies. We correct them here, which leads to 
a slight modification of the original definition. 

We make a distinction between boundary shearlets defined between two pyramidal regions and those 
at the corners, where three pyramidal regions meet. Following [53] we also distinguish between the scales 
} > 1 and j = 0. 

Let us begin with the scales j > 1 and the functions at the boundary where only two pyramidal 
regions meet. We define, for j > 1, ei G (—1,1}, ii = Si2^, \i 2 \ < 2^, and k G Z^, 


'^j,£,kiO 


'^h.kiO 


'4’^,l,kiO 


2-2j-3^(2-2j^)„(2j| _ 4)w(2J| - ^2)exp(-27rf(2-2ZS'7^Ai_2-2.Z-iC,fc)) ,fGVi; 
2-2.-3p^(2-2j^)^(2J| _ 4)^;(2^| - ei4)exp(-27r*(2-2Z5,-^A7^4Z-i^,fc)) G IP 2 ; 

2-2^-3w( 2-2JC)«(2^| - 4)r;(2J| - i2)expi-2Tri{2-^Z^Sl^Af^^Z-^^,k)) ,^GV2; 

2-2J-3w( 2-2JO^;(2^| - 4)«(2^| - £i4)exp(-27rf(2-2^2,57^AT;'4Z-2^,fc)) G IP 3 ; 

2-2^-3W(2-2^e)^^(2^|-4)r;(2^|-4)exp(-27ri(2-2,57^A7;'4?,fc)) ,^gPs-, ^ 

2-2J-3w(2-2je)^;(2^| - 4)^;(2^| - £i4) exp(-27rf(2-2,S7^A7;’4^, fc)) 
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Note that we only give the definition in the regions where the functions have non-trivial support. 
Outside they are supposed to be zero. 

Next, we come to the corners where three pyramidal regions meet. Here we have to glue together 
three shearlet parts, each coming from a different pyramid. For convenience the corner elements will be 
associated to the first pyramid Vi- For j > 1, £1 = ei2^, (,2 = £ 2 ^^ with £ 1,62 S {—1,1}, and k G 1?, 
they are defined as follows: 


r- ^2)exp(-27ri(2-2Z57^H7;4Z-ie,ft)) 

= \ - ei^2)exp(-27r*(2-2Z57^H7^;jZ-i^,fc)) S IP 2 ; 

[2-2J-3lF(2-2je)^^(2^| - £24 )i^(2^| - i2)eM-2ni{2-^ZS^^A^^^Z-^^,k)) 

(32) 


As in the original construction the definition of the boundary shearlets is slightly different at the 
lowest scale j = 0. For j = 0, £1 = ±1, £2 = 0, and k G Z^, we set 


'^jAkiO 

A/,k(0 


WiOvi^ - £i)vi^)exp{-2Tri{^,k)) ,^GVi; 

- h)vi^)exp{-2Tri{^,k)) ,^GV2] 

W{Ov{^ - £i)«(|) exp(-2^z(e, k)) , ^ G P 2 ; 

W{Ov{^ - exp(-2^*(e, k)) , ^ G IP 3 ; 

- 4)^(|) exp(-2^*(e, k)) , ^ G V 3 ; 

- ei)v{^)exp{-27ri{^,k)) ,^G'Pi. 


(33) 


Finally, we come to the corner elements at the scale j = 0. Again they are associated with the first 
pyramid. Let £ 1,62 G { — 1,1}. Then we define them for j = 0, £ = (£ 1 , 62 ), and fc G by 


fW ^( 6'('(|7 -■^ i )' y (|7 -■^ 2 ) exp (- 27 ri (^, fc )) 

A,eA0 = ,^GP2; (34) 

-£2h)v{^ - £2)exp(-27rz(^,fc)) G ^ 3 . 

All boundary shearlets are collected in the family SHbound- Together with the coarse-scale functions 
SHq and the interior shearlets SHint they provide a Parseval frame for L^(R^). 

Theorem 5.8 ([31]). The system ofiD-shearlets 


SH := SHo U SHint U SHjjound 


is a smooth (well-localized) Parseval frame for consisting of hand-limited Schwarz functions. 

The corresponding index set Ash C {0,1,2,3} x Nq x Z^ x Z^ is given by 

Ash := {(0,0,0,fc) : fc G Z^} U {(£,j,£,fc) : £ G {1, 2,3}, j G Nq, £ G C Z^, fc G Z^} (35) 

with the shear parameters = {(£i,£ 2 ) G Z^ : |£i| < 2^, |£ 2 | < 2^} for e G {2, 3} and .ifij- = {(£i,£ 2 ) G 
Z2 : |£i| < 2\ |£ 2 | < 2^} U {(iV, ±2^)} at each scale j G Nq. 

The system SH is another instance of a system of i-shearlet molecules (at least after an appropriate 
re-indexing). In particular, it falls within the framework of a-molecules for a = i. 

Proposition 5.9. Appropriately re-indexed, the smooth Parseval frame of band-limited shearlets SH 
constitutes a system of 3-dimensional ^-shearlet molecules of order (00,00,00,00). 

Proof. We first re-index the coarse-scale functions (1331) as well as the boundary elements (1331) and ([33]) 
at scale j = 0. For this we utilize the index set F C {0,1, 2, 3} x Nq x Z^ given by 

F := {(0,0,0)} U {(£,0,£) : £ G (1, 2, 3},£ = (±1,0)} U {(1,0,£) : £ = (±1,±1)}. 
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The functions we want to re-index are then precisely those functions '4’'j ik ^ with (e,j, k) G A := 
r X Z^. It holds #r = 11, and we enumerate the set T in lexicographic order from 0 to 10, described 
by a bijective function A/” : T —>■ {0,..., 10}. For (e,j,£, k) G A we then re-index as follows. Writing 
k{e,j,i,k) := (fci,fc 2 ,ll • ^3 + Af{s,j,£)) G I? the re-indexed elements are given by 


For {e,j,£,k) S Ash\A the functions ■= 


for (e, j, £, k) G A. 

remain the same. The relabelling is thus given by 


A'labei '■ Ash —>■ Ash, {£,j,£, k) 


{0,0,0,k(e,j,£,k)) 

{£,i,^,k) 


, {e,j,£,k) G A; 
, {e,j,£,k) i A. 


(36) 


The newly obtained system 

SH ■= {V’yf,/c}(£jy_fc)gAsH 


(37) 


is equipped with a shear let index set Ash C {0,1, 2,3} x Nq x Z^ x Z^ similar to (|55]) . however with the 
modified shear parameters Afip = J^ 2 ,o = -S? 3 .o = {( 0 , 0 )} at scale j = 0 . 

In the remainder we show that (EZl) is a system of ^-shearlet molecules of order ( 00 , 00 , 00 , 00 ). As 
parameters we fix cr = 4 and rjj = 2~^ for j £ Nq. The translation parameters T = diag(ri,T 2 ,T 3 ) vary 
with the indices and are chosen suitably later. Clearly, we have 5^A} = A\ ■ Thus, defining 

7 },,,fc(x) := + Tk)), x G (38) 


for every {e,j,£,k) G Ash we get the desired representation (fT^ . i.e. 


- Tk) = 22^7|,,,,(Z-Ai 


^Sej^.Z ^x 


-Tk), xG 


On the Fourier side the generators (1551) take the form 


Z-^Cj expi27Tt{^,Tk)), e G (39) 

and it remains to show that these functions satisfy m- For this task we distinguish between the coarse- 
scale elements, the interior shearlets and the boundary shearlets of dSZl). 


Coarse-Scale: The coarse-scale elements {tpQ q ^Ifegza of (1571) are precisely those functions ^ f, G SH 
where {e,j, £, k) G A. For them (1551) simplifies to 7 ° q ^ = "(/iQ q j,(- -I- Tk), and we just need to show (1T51) 
for 7 Q 0 fe with k G Z^. For this T = diag(l, 1, T-) is the right choice. A calculation yields for every 
(e, j, £,k) G A and k = k{e, j, £, k) G Z^ 


^o,o,fe(£j7.fc)(^) =^ydfe(^)exp(fr^3 ■Ar(e,j,£)), ^ = (6,6,6)'^ e 

Hence, looking at the definitions (|55)) . (1551) . (1551) . it follows 7q 0 k ^ C))“(]R^) with support in [—i, i]^. 

Interior Shearlets: Choosing T = diag(l, 1,1), Equation (1591) together with (150l) yields for ^ G 

llGkiO = W{Z-M,^,Z--0V{Z--0, 

where the matrix Mj^i := 2~^^Ap ST has the form 

^ 2 -J 0 £i2-A 

= ( 0 2 -J £ 22 -^ 

0 0 1 
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(40) 




We check m exemplarily for the case e = 3 where = I. A calculation yields 

+ £i^3)\ 

= ( 2 -^( 6 ^+£ 26 ) 1 , C = ( 6 , 6 , 6 )^ e K'- 

Subsequently we prove that for j G No and |£|oo < 2-^ 

supp C P 3 n e ^ < ICsl < ^}- (41) 

First observe that clearly supp (yfy^fe) ^ 'Ps- Now take ^ G V 3 and ^ ^ G : ^ < |^ 3 | < ^}. With 
1 ^ 3 ! > i also |[Afjy^] 3 | > i and thus ^ supp W. If on the other hand l^sl < ;^ then |[Mjy^] 3 | < ^ 
and, since j > 0 , |^i| < |^ 3 | and |£|oo < 2 -^, 

\{MjM = |2-^(6 + ^i6)l < 2-^(1 + KiDICsl < 2-^'(l + 2^')|6I < (1 + 2-^)|6l < 2|6I < 1^- 

Analogously, one obtains |[Mjy^] 2 | < Altogether, this yields |Mjy^|oo < if l^sl < which implies 

Mj^i^ ^ supp W. Consequently, 7 ^^ = 0 outside 7^3 n G : ^ < l^s] < i}, which proves (HTl) . 

Using analogous estimates for e G {2,3} it follows that supp Jj^e^k — [“!’every 
{e,j,£,k) G Knt = {(e,j,£,fc) G {1,2,3} x No x x : \£\^ < 2^}. 

The derivatives of 7 }^ f. are linear combinations of functions d^W{Z^^)d^V{Z ~^with co¬ 
efficients uniformly bounded, since the entries of are. This fact together with the support condition 
implies that (fT^ is fulfilled for arbitrary order. 

Boundary Shearlets: The boundary shearlets of (1371) satisfy j > 1 and are precisely the functions 
in iiD and (I32[) . We exemplarily handle the functions (1311) . the argumentation for the functions (1321) is 
similar. Let us first look at the boundary between Vi and V 3 , i.e. the functions ip'j e where e = 3, j > 1, 
£1 = ±2f, |£ 2 | < 2f, and k G Z^. Plugging (IHTl) into (IMl) and choosing T = ;|diag(l, 1,1) we obtain 

.3 ^ 2-3 / W{M,,,OViO , e e M-JV,; 

\w{Mj,eOViN^A) >Ce7W-/iPi; 

for the generators, where Mj i is the matrix (gni) and where 

/ -£i 2 -J 0 2 ^-£p-^ \ 

:= - sgn(£i)£22-^' 1 £2 - 2-^ |£i|£2 . 

\ 2-J 0 2-f£i J 

Similar calculations as for the interior shearlets then prove 

supp (7^7.,) C G R3 : |||<2,|||<4}n{^GR3 : ^ < 1 }. 

For e G {1,2} complementary results hold true. Altogether, it follows supp C [—2,2]3\(—^)3 

for the generators of the functions m- Hence, condition (fT51) is fulfilled for arbitrary order. □ 

Proposition 15.91 in particular shows that the system SH is a system of ^-molecules, however not 
with respect to a shear let parametrization because of the necessary relabelling Fiabei in (1361) . The actual 
parametrization is given by 

^SH ■= ^SH O Fiabel, 

where ^sh denotes the |-shearlet parametrization of the relabeled system. 

Corollary 5.10. The smooth shearlet frame SH is a system of dimensional ^-molecules of order 
( 00 , 00 , 00 , 00 ) with respect to the parametrization {Ash,^sh)- 

Remark 5.11. Although {Ash, ^sh) is not a shearlet parametrization, it clearly is (^, k)-consistent with 
every ^-shearlet parametrization for k > 3. This follows from Provosition 1 5. f)\ and the observation that 
relabelling of elements does not make any difference here. 
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5.3.2 Compactly Supported Pyramid-adapted Shearlets 

There also exist shearlet frames for and L^(R^) consisting of compactly supported functions. 

Compactly supported frames of the form (E71) have been constructed in [HI |37] • They are also instances 
of i-shearlet molecules and their order can be controlled by the regularity of the generators. 

Proposition 5.12. Let be compactly supported and L, M, Ni, N 2 S Nq U {00}. If 

4> G and if, for every e G {1, 2, 3}, 

(i) the derivatives exist and are continuous for every 7 G Ng with [^'^7)2 < fVi + iV2 and 

< Ni, where Z is the cyclic permutation matrix (ITT]) . 

(ii) the generator has M + L directional vanishing moments in e^-direction, i.e. 

V(a;i, ^2) G : f {Z^ x)x^ dx^ = Q for every N & {Q,... ,M + L — 1}, 

JK. 


then the system (EH) obtained from these generators is a system of ^-shearlet molecules of order [L, M, Ni, N 2 ). 

Proof. It is obvious that - rightly indexed - a system of the form (EZl) constitutes a system of 
^-shearlet molecules. Hence we just need to verify the order of the system. For this, little more is 
needed than utilizing the facts that spatial decay implies smoothness in Fourier domain (and vice versa), 
and that vanishing moments in spatial domain implies estimates of the form | 5 (^)| < min(l,|^|)^ in 
Fourier domain. We refer to EH Proposition 3.11] for details, where a similar two-dimensional version of 
the theorem is proven. □ 


6 Proof of Theorem 12.5 

This final section is devoted to the technical proof of Theorem 12.51 It is split up into several pieces and 
has the same general structure as the proof of the corresponding 2-dimensional result ESJ Theorem 4.2]. 
In d dimensions however it takes more effort and the arguments are more involved. 


6.1 Auxiliary Lemmas 

Let us first collect some simple elementary facts, which turn out to be useful. Subsequently 0{d,M.) shall 
denote the orthogonal group of R.'^. Recall also the notation {9} for the ‘projection’ of 0 G R onto the 
interval [—^, ^) in the sense of Subsection 12. 1 1 

Lemma 6.1. For 0 G R tet {0} denote its ‘projection’ onto the interval [—§, as introduced in Subsec¬ 
tion \2.1l It then holds |{0}| x |sin(0)|. 


Proof. Due to 7r-periodicity it suffices to verify the relation for 0 G [— f, ^). In this range we have 
f|0|<|sin(0)|<|0|. □ 

An immediate corollary is the following result. Recall the notation dsiv, w) for the angle arccos((w, w)) G 
[0,7r] between two vectors v,w £ 

Lemma 6.2. Let Cd G R'^ be the d:th unit vector. For rj G we have |{fi§(r?,6^)}! x \ri\^,^_iy 
Proof. Using a suitable rotation R G 0{d,M.) of the form 


R = 


f Rd-l 

V 0 


where Rd-i G 0{d — 1,R), we can achieve Rr] = (sin(0), 0,..., 0, cos(0))^ with 0 = ds{r],ed). Since 
\v\[d-i] = \v-ed\[d-i] = \R{v-ed)\[d-i] = \Rv-ed\[d-i] = | sin(0)|, it just remains to prove |sin(0)| x |{0}|, 
which is true by Lemma 16.11 □ 
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Lemma 6.3. Let c> 0 be a constant. Then we have for all v,w ^ with > c and [wjd > 0 

|{ds(t), w)}| X I?; - w\. 

Proof. Under the assumptions there exists e > 0 dependent on c, such that 0 < ds{v, w) < tt —e. It follows 
w)| < |{ds(u,ui)}| < |ds(u, ui)|. The observation ds{v,w) x ju — w| finishes the proof. □ 

Lemma 6.4. Let R € 0(fi, K) be a rotation and Oo = dg{ed, Red) € [ 0 , 7 r] the angle between the d:th unit 
vector Cd G and its image Red under R. Then it holds for all rj G 

\Rv\[d-i] = sm{ds{Rr],ed)) > min{|sin(d§(? 7 , e^) + 0o)\ , \sm{ds{r], Cd) - 6 *o)|}. 

Note dg{r], ed) = dg{Rr], Red). 

Proof. Let rj = {r]i,..., rjd)'^ G and put 9i := dg{r],ed) = aiccos{{r],ed)) G [ 0 , 7 r]. The rotation 
i? G 0{d, R) can be decomposed in the form R = RRog with i?, Re^ G 0{d, R) such that 

/ \ /cos( 6 »o) -sin( 6 »o) 

R = and Re, = Id-2 

^ ^ \sin( 6 »o) cos( 6 »o) 

where Rd-i G 0{d— 1, R) is some {d— l)-dimensional rotation matrix and Id -2 is the (d— 2)-dimensional 
identity matrix. The rotation R leaves | • |[d-i] invariant, whence 

\Rv\id-i] = \RReoV\[d-i] = \ReoV\id-i]- 

Using r]d = cos(0i) and = di + dl + ■ • ■ + Id-i — ^ ~ it further follows 

\ReoV\[d-i] = (cos(6»o)r7i - sin(6»o)r7d)^ + vi + ■ ■ ■ + Vd-i 

= cos^(do)di + sin^( 0 o) cos^(di) — 2 cos( 0 o) sin( 0 o)di cos(di) + (1 — 77 ^ — cos^(di)) 

= 1 - (771 sin( 0 o) + cos( 0 i) cos( 0 o))^. 

The last expression is a second-degree polynomial in the variable 771 with a negative leading coefficient. 
Since 77 ^ < 1 — 77 ^ = 1 — cos^(di) = sin^(0i), the variable 771 can take values only in [—sin(di),sin(di)]. 
The polynomial attains its minimum on this interval at the endpoints. Hence, we can conclude 

\ReoV\'\d-i] > min {1 - (esin( 6 »i)sin( 6 »o)-l-cos( 6 »i)cos( 6 »o))^} 

«e{— 1 , 1 } 

= min |l — cos^ldi — edo)} = min | sin^( 6 *i — edo)}? 

££{- 1 , 1 } ^ ^ ££{- 1 . 1 } 

which proves the claim. □ 



6.2 Integral Estimates 

We start with an estimate for the generators in ([S]), which will allow us to work in polar coordinates. 

Lemma 6.5. Let the family of functions {(7A}AgA satisfy ([7]) uniformly for a multi-index p G Ng, and 
assume that there is a constant c > 0 such that s\ > c for all A G A. Then the following estimate holds 
true uniformly for A G A and ^ G R^^ 




min{l,S;,^(l -k ICI)}^ 


(42) 


Proof. We have > min{s;,^^,s^“} |^| > 1^1 uniformly for ^ G R^^ and A G A, since Sa > c > 0 

for every A G A. It follows > sf^ \Rg^R^^f \ = s^^|C|. Further, we observe |A-^^|[d_i] = 

Sa“ ICI[d-i] and = Sa^ |[C]d|. Finally, it holds (|C|) x 1-f |?| and |[^]d| + |^|[d-i] - l^l- Collecting 

all of these estimates, one obtains 
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< 


+ |[^„,ix-Rex-RvxC]d| +Sa^^ “Va,k-^eA-^¥>A^I[d-i]} 


M 


< 


min{l,SA^(l + ICI)} 


M 




The expression on the right hand side of (14211 can further be estimated by the function 


S\^M,Ni,N2{0 ■ — 


min{l,S;^^(l + Id)} 


M 


(1 + + 4 ”“ \RoAAm)\[,.,]r^ 


ce 


□ 


(43) 


As already discussed in [28], this function can be separated into angular and radial components, allowing 
us to treat these parts independently in the integration later. Since the notation in this article differs 
slightly from the one in [28], we choose to state the lemma once more here, but refer to said article for a 
proof. 

Lemma 6.6. f^28l Lemma 6.4]) Assume that sa > c > 0 for all X £ A. For every M, Ni, N 2 , K £ No 
such that K < N 2 we have with respect to X £ A and ^ the uniform estimate 


min{l,SA^(l + Id)} 


M 




< 


S\,M-K,Ni,k{0- 


Next, we want to estimate the scalar product of two functions of the form (H51) . Before the actual 
result. Lemma [6T0] we need some preparation. This is the part of the proof of Theorem l2. 51 which differs 
the most from the situation in two dimensions. 

Lemma 6.7. Let a> a'>0, d£N,d>2, and N > 1. Then we have uniformly for y £ M. 


|a;|‘^ ^ dx 




-N 


(1 + a |a;|)^+^“2(i _|_ q/ |a; — yD^+d--"^ 

Proof. Utilizing the following result from Grafakos [H] [Appendix K.l] 
f dx 


(1 + a |a;|)'^(l + a' |a: — y\)^ 


< 


max{a, a'} ^(1 + min{a, a'} |y|) 


-N 


we can estimate 


\x\'^ ^ dx 


. (1 + a |a;|)^+'^“2(i _|_ q,/ [j. _ yYjN+d-2 




|aa:|‘^ dx 




(1 + |aa;|)'^ "^dx 


(1 + a |x|)^+‘^“2(l + a' |a; — y|)^+‘^“^ 


(1 + a |a;|)^+^“ 2 (x _|_ q/ |a; _ y|)iv+d -2 

= f _ 


(1 + a |a;|)^(l + a' |a; — y\)^~^‘^~‘^ 


< a ^^max{a,a'} ^(1 + min{a, a'} |?/|) ^ = a ^^(l+a'lyl) 


-N 


□ 


We can immediately deduce the following corollary. 

Corollary 6.8. Let a > a' > 0, d £ N\{1}, and N > 1. Then we have uniformly for Oq £ 
r \sin'^-‘^{9)\d9 


/o (1 + 0 |sin(6()|)''^+'^“2(x + a' |sin(0 — 0o)|)^^'^“^ 




-N 
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Proof. Let us call the integral to be estimated S. Since the integrand on the left hand side is 7r-periodic, 
we may change the domain of integration to [—7r/2,7r/2]. Applying Lemma l6.11 we can further conclude 


t/2 


5: 


\e\‘^~^d9 


-Trj2 

Since |{6>o}| < f we can estimate 


(1 + a |d|)^+'^-2(l + a'\{e - ’ 


E 

7r,0,7r} 


\9\‘^~^d9 


(1 + a \9\)N+d-2(l + a'\9- ({0o} + i?)|)^+^-2 ’ 
We now use Lemma 16.71 to estimate this by 


i?€{—7r,0,7r} 


-N 


□ 


i-l 


This result is used to estimate the integral of the angular parts of (f43|l over the sphere 

IL TTjrf —3 

We then have the estimate 


Lemma 6.9. Let a,a' > 0, d G N, d > 2, 9\,9fj, G [0,7r] x [—^ [0: 27r] and N > 1. 
Further, let da denote the standard surface measure on the sphere §' 

f dafq) 


Js.-. (1 + + a' 

< max{a,a'}“*^'^“^^(l + min{a, a'}|{ds(eA, e^)}|) 
where e\ = R'^^Rj^ed and e^ = R'^^Rg^eu- 

Proof. Note the symmetry of the statement with respect to interchanging the entities a, a' and A, p. 
Without loss of generality we can therefore restrict to the case a > a' > 0. 

Since the mapping Re R^p,,^ is an isometry, the integral is equal to 


5 := 


da{T]) 


4-1 (1 + a b|[,_i])^+''-2(l + a'\Rg,Rp,R^^Rlv\[d-i]r+^-^ ' 


For the integration we parameterize the sphere ^ by standard spherical coordinates, i.e. coordinates 
(01,... 9d-2, p) G [0, X [0,27r) such that for rj G 


r]{9,(p) = 


( sin(0i).sin(0d-2) cos((/?)\ 

sin(0i).sin(0d-2) sin((p) 

sin(0i) • • • sin(0d_3) cos(0d_2) 


V 


cos(0i) 


/ 


Observe that {f],ed) = cos(0i) and thus 0i = ds{ri,ed). Also note |f?|[d-i] = |sin(0i)|. Letting 0o := 
ds(eA,e^) G [0,7r] denote the angle between e\ and we have 9o = ds{ed, Re^Rp^R'^^Rj^ed). Since 
Re^Rip^R^^R^ G 0{d,R) we can apply Lemma to estimate \Re^Rp,,R^ Rj r]\[d-i]- We obtain 


5 < 


< 


^ 71- ^71- ^TT sin (0i)sin '^(02)---sin(0d_2)(i0i(i02 ---d 0d-2d7> 

0 Jo Jo (1 + a|sin(0i)|)^+^-2(l + a'min{|sin(0i + 0o)| , |sin(0i - 
1"^ {91) d91 

0 (1 + a| sin( 0 i)|)^+'^“ 2 (']^ _|_ „/ inin{|sin(0i + 0o)| , |sin(0i - 

|sin(0i)|‘^-2d0i 


^ E 

eef-1,1} 


Iq (1 + a\ sin(0i)|)'^+'^“2(i _|_ Q,/ |sin(0i — e0o)|)^'''‘^“^ ' 

Using Corollary 16.81 we finally arrive at 5 < max{a, (l + min{o, a'} |{0o}|) ^■ 


□ 
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With this estimate for the angular components in our toolbox, we proceed to prove the main result 
concerning the correlation of functions of the form (1431) . 

Lemma 6.10. Let a € [0,1], d € N, d > 2, and M,Ni,N 2 S Nq. Further, let (A, $a) and (A, <1>a) be 
parametrizations with {sx,ex,xx) = $a(A) and = ‘hA(/^) for X € A, p, € A, such that c < sx 

and c < Sfj, for a fixed constant c > 0. Then for A > 0 and B > 1 satisfying 

Ni>^, M+ d> and N 2 > B + d - 2 

the following estimate holds true with an implicit constant independent of X G A and p G A, 

{sxSf,) ^ J Sx,M,Ni,N 2 {x)Sf,,M,Ni,N 2 {x) dx < maxj^, (1 + min{sA,s^}^““|{d§(eA,e^)}|) 

Proof. Without loss of generality we subsequently assume sx < s^. The strategy is to separate the 
integration into an angular and a radial part and estimate these independently. For the estimate of the 
angular part we can use Lemma 16.91 which yields 




(i+„(d i) 


FI 

Jo JS"^- 


Sx,M,Ni,N 2 {'nA)Sfj.,M,Ni,N 2 {'n,r)i^ d(j{r])dr 


with a remaining radial integral 


5 := 


_d min{l,s;,^(l + r)}^min{l,s^i(l + r)}^ 

Jo (I + SaV^i + ' 


dr. 


Note that for the estimate we used the assumptions sx < s^, B > 1 and N 2 > B + d — 2. It remains to 
verify the relation • S < (s^/sa)”"^, or equivalently 


5 < 


(-) 

\sxJ 


-A- 


l + a(d-l) 


To prove this, we split the integration of S into three parts 5i,52,^3 corresponding to the integration 
ranges 0<r<l,l<r<s^, and s^ < r respectively. 


0 < r < 1 : Here we estimate min {1, ^(1 + r)} < s_^'^(l + r)^ < 2^s^^ and (l + s^^r)^i > 1, 

and similarly for the index p. Hence, the integral over this part can be estimated by 


5, < s-‘^s-^s-^ I dr X sL^^+<^hT^ < ( ^ 


Sx 


-(M+d) 


where the last inequality holds because of the uniform lower bound 0 < c < sa for X G A. Finally observe 
that the assumed inequalities imply M + d > A + _ 

1 < r < We estimate the terms involving p as follows: (1 + > 1 and (r + 1) < 2r. Hence 

min{l,s;i(l + r)}'^ < s;“(l + r)“ < + r)“ < 2^ 5 -^r^. 

For the terms with A’s, we have (1 + s'^^r)^^ > s'^^^r^^ and min {l, Sa A 1- The integral S 2 

hence satisfies 


52 < sFsFs-^ ^M-N,+d-l ^ 


-Ni 


where it was used that M + d > Ni, which implies M + d — Ni — 1 > —1, for the integration. By 
assumption Ai > A + giving the desired result. 
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Sfj, < r We estimate both terms like the A-terms above to obtain 


^ -Ni 


poo 

s, < jf dr < < (|) 

The integral converges since > |. Since iVi > A + the proof is finished. 


□ 


6.3 Cancellation Estimate 


Theorem 12.51 provides estimates for the scalar products of a-molecules. To derive them we evaluate 
these scalar products on the Fourier side, where we can take advantage of cancellation phenomena. 
Technically, the method is based on a clever integration by parts involving the following differential 
operator, depending on A S A, /r S A, 




1 + s 


2(l-a) 


|{d§(eA,e;,)}|' 


-(eA,V)^ 


(44) 


where sq = min{sAj S/i}, 21 is the identity operator, V the gradient and A the standard Laplacian. 
Lemma 16. 1 II shows how ,, acts on nroducts of functions a\. b,, which satisfv ©• 

Lemma 6.11. Let a\ and 6^ satisfy © for every multi-index p G Nq with \p\i < L and assume 
s\, Sfj, > c > 0. Then we ean write the expression 


as a finite linear eombination of terms of the form 


PxiK^x^ex Ripx 


-1 

Qt,Sr, 




with functions p\, q^, which satisfy © for all multi indices p G Nq with \p\i < L — 2. 

Proof. For convenience we introduce the operators 0\ := and := A~^^^Rg^R^^. Fur¬ 

ther, we define the functions a\{f,) := a\{0\f) and := b^{O^Sf). We also abbreviate ^a := 0\f and 
iu '■= 0^£,. Taking into account s\ > 1, we observe ||Oa|| 2^.2 = ||A“^|| 2^.2 = inax{sA“, Sa < sf°‘. 
Analogously, it holds ||0;i||2->.2 = ll"4aXll2-J.2 < s^“. Finally, we introduce the ‘transfer’ matrix 

Ta,^ := Re^R^^Rl.Rl G 0{d,R). (45) 

After these remarks we turn to the proof, where we treat the components of -SfA,/j separately. 


I This term causes no pain. 


Sq“A By the product rule we have 

^{a\bfj.) = 2(VaA, V6^) -b caA^^ -b 6 ^Aoa . 

'-V-" '■-V-' 

A B 

In the following we first treat part A and then part B. 


A The chain rule yields VaA(C) = for every ^ G and an analogous formula for 6^. Thus 

we obtain 


(VSa( 0,V^(0) = (OrVaA(6),OjV6^(C^)) = {O^Olyax{fx),Vb^{^^)). 

The expression (O^O^Voa, V5^) is a linear combination of the products dia\djb^, where i,j G {1,..., d}, 
with the entries of the matrix 0^0^ as coefficients. The functions dia\ and djb^ clearly satisfy ([7|) for 
every p G Nq with |p|i < L — 1. Moreover, the entries of the matrix O^Oj^ are bounded in modulus by 
\0^dOlh^2, which in turn obeys the estimate 


|OmOa|| 2->2 — \\^als^R\,U^alsx\V^ 


^ Ma,i,,ll2-i.2||^a,sxll2-i>2 ^ (V'^a) “ < 
where Sq = min{sA, s^}. This shows that the function Sq“A can be written as claimed. 


-2a 
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B Due to symmetry it suffices to treat the term bfj_Aa\. Since for ^ G and since 

fulfills condition © for every p G Ng with \p\i < L, the function 6^ is a suitable first factor with the 
required properties. Let us investigate the second factor Aa\. 

The second derivative of a\ is at each ^ G a bilinear mapping x —)> R, which by the chain 

rule satisfies for w, re G R"^ 

= a"(^A)[OA'y,OAw]- 

Thus, we have the expansion 

d d 

^axiO = X!“"(0[ei,e*] = ^ a"(^A)[OAei, Oas*]. 

Let p G Nq be a multi-index with |p|i < L — 2. Then the partial derivative with respect to p of the 
function ^ >->• J2i=i 0\ei] clearly exists. It remains to prove the frequency localization (jT)). 

In view of dP{a'l) = {df'ax)" we can estimate for every i G {1,... ,d} and every ^ G R'^ 

sr\d^a'iiO[Oxe.,Oxe.]\ < s^“P^a';((0|||||OA||L2 < 

The norm of the bilinear mapping is given by |||i9'’a';((^)||| = sup|^,|j^|^i |cl^a"(^)[u, w] |. This is equal to the 
spectral norm of the corresponding Hesse matrix. Therefore we can deduce |||9^aA(0lll ^ sup |^|^^2 \d^d^ax{^) 
The functions d^d^ax satisfy ([7]) for every /3 G Ng with |/3|i = 2 due to the assumption on ax- The re¬ 
quired frequency localization follows. 

'So(f + |{c^s(eA, e^)}|^)~^(eA, V)^ First we put ici := sg, W 2 := s§“ |{(Is(eA, e^)}|“^, and wg := 

|{(i§(eA, e^)}|~^ and notice that the pre-factor satisfies 

So(l + l{c^s(eA,e^)}|^)“^ < min{wi,■u; 2 , wg}. (46) 

The first two estimates are obvious. For the third, recall that I -I- > 2t for all t G R. Hence, 

\{dsiex, e,)}\Y^ < |{4(eA, e^)}|)-i < |{4(eA, • 

We begin with the product rule, which yields 

{ex,yy{dxbfj) =b^{ex,yydx + 2{{ex,y)ax){{ex,V)b^) +dx{ex,y)'^b^- (47) 

Recall that ba = R'^^Rj^ed- We calculate with the chain rule for ^ G R'^ 

(ba, VaA(C)) = (Oaga, VaA(CA)) = VaA(CA)) = s^^ddaxiix), 

where we used Oxex = A~\^ed- Using the ‘transfer’ matrix Tx^^ from (1451) . we similarly obtain 

(ba, V6^(e)) = (O^ba, V&^(e^)) = (H-i^TA.^e,, V6^(e^)). 

Next, we note that {ex,V)'^dx{0 = o"(^)[eA, ba]. Together with the chain rule, this implies 
(ba, V)^dA(C) = aA(^A)[OAeA,OAeA] = ax{^x)[A~^sx^d, A~^^^ed] = djaxi^x)■ 

We also obtain 

(ba,V)%(0 = b''i^^)[0^ex,0^ex] = {{A-lsTx,^ed.V?b^){^^). 

Let us henceforth use the abbreviation p := Tx^f^Cd G Plugging the above calculations into (ITTl) 

leads to the following expression for {ex, '^)^{d'xbf_i){^) at ^ G R'^ 

• djaxi^x) + 2s^^ddax{^x) ■ V&^(e^)) + ax{^x) • {{A-^s,v,V)%){^^). (48) 
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For the first summand of (H5)) we consider the product of the functions s^^d‘^a\ and b^. Since 
and in view of (gni) the pre-factor wi is compensated. Due to the assumptions on a\ and 6 ^ 
the product is thus of the desired form. 

Let us put ? 7 [d-i] := (? 7 i, ■ • ■, %-!, 0)^ S and := (0,..., 0, 77 ^)^ G and observe that 

= ^aX(»?[d-i] + r][d]) = s;:“?7[d-i] + s^N[d]- 
The second summand of (l4^ then becomes - up to the factor 2 - 

ddaxiix) ■ -hs^^s;;^77d5d6M(CM))- 

We choose the function dddx as the first factor, which clearly has the required properties, and the function 

? ^ Sx^s~°‘{V[d-i],yb^,{i)) + s~^r]dddb^{C)- 

as the second factor. The second component of this function causes no problems because |? 7 j;| < 1 and 
the pre-factor wi is compensated due to (sAS/i)~^ ^ To deal with the other term, notice that by 

Lemma[ 0 |? 7 [d-i]l = \v\[d-i] - |{c?§(eA,e^)}|- Thus 

l{c^§(eA,e^)}| |V5^|. 

The fact that 9^6^, j G {1,..., d}, satisfy ([7]) by assumption, and that |{ds(eA, e^)}| compensates 

W 3 , implies that also the first component satisfies the required properties. 

Let us turn to the last summand of ((48l) . The first factor ax is of the desired form. For the second 
factor we expand the function in the form 

®/i^°'(d[d-i]) -I- “??d(77[d-i]) V)5d&/i + s^'^rj'^ddbfi. 

Its partial derivatives of order p G Ng with |p|i < L — 2 clearly exist, and we get the estimate 


d-l 

|(A-Xr7,V)25^5^| <So-2“|ds(eA,e^)P ^ + 2so-i-“|ds(eA,e^)||V9d5%| + 

i,I=l 

Here we again used that |77[(i_i]| x |{ds(eA,e^)}| according to Lemma 16.21 This estimate completes the 
proof, taking into account the estimate (1461) of the pre-factor and the fact that the partial derivatives of 
bf^ up to order L satisfy 0. □ 

6.4 Proof of Theorem 12.51 

At last we have all the tools available to prove Theorem [231 Write Ax = — Xx- An application of the 

Plancherel identity yields 

{'mx,Pti) = {mx,p^) 

a{d-l) + l /* , ^ ^^- 

= {sxSf,) 2 / ax{A^^,^Rg^R^^^)bf,{AaXR0^R^^^)exp{2TTi{^,Ax))d^ 

JR'^ 

for two a-molecules mx and with respective generators ax and b^. According to Lemma 1^31 the Fourier 
transforms of the generators therefore satisfy (jUl) for every p G Ng with |p|i < L 

Next, we want to exploit cancellation. For this we utilize the differential operator .^x,ii from (|43) . 
First, we observe that partial integration yields 

exp(27ri(^, Ax)), dA(A"^^,^ As;, 

= (| exp (27ri (5, Ax )), {ax (A" Rg^ RvxOK (Aa^ 0)), 
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since the boundary terms vanish due to the decay properties © of the generators and its derivatives. 
Note that we assume > d/2 and L > 2N. Second, we calculate for ^ 


Ax))) = l+47r2so“|Ax| + 


Consequently, we have 


47r2so(eA, Ax)^ 


N 


1 + Sp^^ |{ds(eA,e^)}/ 


exp( 27 ri(^. Ax)). 


{mx,Pf,) = ( 1 + 47r^So“ |Ax|^ + 


47 r2s2(eA, Ax)" 


-N 


1 “h 5 , 


2(l-a) 

0 


|{ds(eA,e^)}|" 


5a, u. 


with 


Sx,f, := [sxSf,) f ^^^^(ax{A^^^^Re>,Rvx0bf,{Aa,!s^Re^R^J))ejip{2TTi{^,Ax))dt 

R'i 

Since L > 2N by assumption. Lemma 16.111 can iteratively be applied N times, and we conclude that 


{ax {A-},, Re, R^,Ob^ (^ Re^R^^O) 

can be written as a finite linear combination of terms of the form 

Px{A^g^ Re, Rip,^')Qfj. (4^a,s^ Re^R^p^^^^-! 

where px and satisfy ([7]) (for the multi-index just containing zeros). 

Using Lemma TG.51 and putting K = 2N + d — 2 < N 2 in Lemma [6.61 then yields 


A^x,fj,{axiA^ g^Re^Rip^^)b^{Aa,B^Re^RiPf^O) \ 


^ Sx,M-(2N+d-2),Ni,2N+d-2{0RtJ.,M-{‘^N+d-2),Ni,2N+d-2{0- 
Due to the assumptions, we can further choose a number N < Ni which satisfies 


{M - {2N + d-2)) + d> N > N + 


1 -f a{d — 1) 


(49) 


Since A < Ai we have the estimate Srj,M-{ 2 N+d- 2 ),Ni, 2 N+d -2 < Sr,,M-{‘ 2 N+d- 2 ),N, 2 N+d -2 d = \ d- 
Hence, we obtain 


:.(d-l) + l 


|5A,/i| ^ (sxSfi) 2 ^ Sx,M-{2N+d-2},Ni,2N+d-2){0Rp.,M-{2N+d-2},Ni,2N+d-2{0 

■’\,M-(2N+d-2),N,2N+d-2 iOs^ 

,M-(2N+d-2),N,2N+d-2 iOd^ 


< (saSm)“^^ / 5, 


<max|—(1-fSo^ |{ds(eA,e;,)}|) 


Here we used (|4^ and Lemma [6.101 in the last line (using this S and setting M = M — (2N + d — 2), 
A = N nnd B = 2N (B > 1, A > 0 since N > 1)). 

Altogether, we arrive at the desired estimate 


\{mx,Pp)\ < max 


sx V 
Sfj. ’ sx 


-N 


1 4 - Sg“ |Axf -I- 


So(eA, Ax)" 


-N 


< max 

I 


l-fsg^^ |{ds(eA,e^)}|^/ 
f Sa 1 ( 1 , 2(1 — a) I r 7 / M |2 , 2a I A |2 i SQ(eA, Ax) 

k'TA + n..g<-°IWe. 


(1 + 4 ^ |{d§(eA,e^)}|)’ 


-N 


|{ds(eA,e^)}|" 


UJa{\p)~ 


■2N 
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For the last estimate observe that the inequality between the arithmetic and the geometric mean 


(l + |{ds(eA, e^)}|"‘) + 


So(eA, Ax)" 


l + Sg^^ |{(i§(eA,e^)}|" 


> 2so|(eA, Ax)| 


implies 


1 I 2(1 —a) I r 7 / N-i |2 . 2a i a i ^x) 

l + Sg |{ds(eA,e^)}| + Sg |Ax| + 2 (l-a) , , -^7)2- 

1 + So |{rfs(eA,e^)}| 

J (l + sf|{*(eA, e,)}f + IAxh + ^ f 1 + sf |{ds(eA, 

2^ ^ l + sg^^ “M{t^s(eA,< 

> 1 + |{ds(eA,ep)}|^ + Sg" |Ax|^ + sg |(eA, Ax)| = 1 + da{X,fJ.). 


This concludes the proof. 
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